Your SlideShare is downloading. ×
Semantics
Semantics
Semantics
Semantics
Semantics
Semantics
Semantics
Semantics
Semantics
Semantics
Semantics
Semantics
Semantics
Semantics
Semantics
Semantics
Semantics
Semantics
Semantics
Semantics
Semantics
Semantics
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Semantics

461

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
461
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Semantics of Multithreaded Java Jeremy Manson and William Pugh Institute for Advanced Computer Science and Department of Computer Science University of Maryland, College Park {jmanson,pugh}@cs.umd.edu January 11, 2002Abstract Memory Model [JMM]. There is a rough consensus on the solutions to these issues, and the answers pro-Java has integrated multithreading to a far greater posed here are similar to those proposed in anotherextent than most programming languages. It is also paper [MS00] (by other authors) that arose out ofone of the only languages that specifies and requires those discussions. However, the details and the waysafety guarantees for improperly synchronized pro- in which those solutions are formalized are different.grams. It turns out that understanding these issues The authors published a somewhat condensed ver-is far more subtle and difficult than was previously sion of this paper [MP01]. Some of the issues dealtthought. The existing specification makes guarantees with in this paper, such as improperly synchronizedthat prohibit standard and proposed compiler opti- access to longs and doubles, were elided in that pa-mizations; it also omits guarantees that are necessary per.for safe execution of much existing code. Some guar-antees that are made (e.g., type safety) raise trickyimplementation issues when running unsynchronized 2 Memory Modelscode on SMPs with weak memory models. This paper reviews those issues. It proposes a new Almost all of the work in the area of memory modelssemantics for Java that allows for aggressive com- has been done on processor memory models. Pro-piler optimization and addresses the safety and mul- gramming language memory models differ in sometithreading issues. important ways. First, most programming languages offer some safety guarantees. An example of this sort of guaran- tee is type safety. these guarantees must be absolute:1 Introduction there must not be a way for a programmer to circum- vent them.Java has integrated multithreading to a far greater Second, the run-time environment for a high levelextent than most programming languages. One de- language contains many hidden data structures andsired goal of Java is to be able to execute untrusted fields that are not directly visible to a programmerprograms safely. To do this, we need to make safety (for example, the pointer to a virtual method table).guarantees for unsynchronized as well as synchro- A data race resulting in the reading of an unexpectednized programs. Even potentially malicious programs value for one of these hidden fields could be impossi-must have safety guarantees. ble to debug and lead to substantial violations of the Pugh [Pug99, Pug00b] showed that the existing semantics of the high level language.specification of the semantics of Java’s memory model Third, some processors have special instructions for[GJS96, §17] has serious problems. However, the so- performing synchronization and memory barriers. Inlutions proposed in the first paper [Pug99] were na¨ ıve a programming language, some variables have specialand incomplete. The issue is far more subtle than properties (e.g., volatile or final), but there is usuallyanyone had anticipated. no way to indicate that a particular write should have Many of the issues raised in this paper have been special memory semantics.discussed on a mailing list dedicated to the Java Finally, it is impossible to ignore the impact of This work was supported by National Science Foundation compilers and the transformations they perform.grants ACI9720199 and CCR9619808, and a gift from Sun Mi- Many standard compiler transformations violate thecrosystems. rules of existing processor memory models [Pug00b]. 1
  • 2. 2.1 Terms and Definitions be synchronized so that the first access is visible to the second access. When a thread T1 acquires a lockIn this paper, we concern ourselves with the seman- on/enters a monitor m that was previously held bytics of the Java virtual machine [LY99]. While defin- another thread T2 , all actions that were visible to T2ing a semantics for Java source programs is impor- at the time it released the lock on m become visibletant, there are many issues that arise only in the to T1 .JVM that also need to be resolved. Informally, the If thread T1 starts thread T2 , then all actions visiblesemantics of Java source programs is understood to to T1 at the time it starts T2 become visible to T2be defined by their straightforward translation into before T2 starts. Similarly, if T1 joins with T2 (waitsclassfiles, and then by interpreting the classfiles us- for T2 to terminate), then all accesses visible to T2ing the JVM semantics. when T2 terminates are visible to T1 after the join A variable refers to a static variable of a loaded completes.class, a field of an allocated object, or element of When a thread T1 reads a volatile field v that wasan allocated array. The system must maintain the previously written by a thread T2 , all actions thatfollowing properties with regards to variables and the were visible to T2 at the time T2 wrote to v be-memory manager: come visible to T1 . This is a strengthening of volatile • It must be impossible for any thread to see a vari- over the existing semantics. The existing semantics able before it has been initialized to the default make it very difficult to use volatile fields to com- value for the type of the variable. municate between threads, because you cannot use a signal received via a read of a volatile field to guar- • The fact that a garbage collection may relocate a antee that writes to non-volatile fields are visible. variable to a new memory location is immaterial With this change, many broken synchronization id- and invisible to the semantics. ioms (e.g., double-checked locking [Pug00a]) can be • The fact that two variables may be stored in ad- fixed by declaring a single field volatile. jacent bytes (e.g., in a byte array) is immaterial. There are two reasons that a value written to a Two variables can be simultaneously updated by variable might not be available to be read after it different threads without needing to use synchro- becomes visible to a thread. First, another write to nization to account for the fact that they are that variable in the same thread can overwrite the “adjacent”. Any word-tearing must be invisible first value. Second, additional synchronization can to the programmer. provide a new value for the variable in the ways de- scribed above. Between the time the write becomes visible and the time the thread no longer can read3 Proposed Informal Semantics that value from that variable, the write is said to be eligible to be read.The proposed informal semantics are very similar to When programs are not properly synchronized,lazy release consistency [CZ92, GLL+ 90]. A formal very surprising behaviors are allowed.operational semantics is provided in Section 8. There are additional rules associated with final All Java objects act as monitors that support reen- fields (Section 5) and finalizers (Section 6)trant locks. For simplicity, we treat the monitor as-sociated with each Java object as a separate variable.The only actions that can be performed on the moni- 4 Safety guaranteestor are Lock and Unlock actions. A Lock action by athread blocks until the thread can obtain an exclusive Java allows untrusted code to be executed in a sand-lock on the monitor. box with limited access rights. The set of actions The actions on individual monitors and volatile allowed in a sandbox can be customized and dependsfields are executed in a sequentially consistent man- upon interaction with a security manager, but thener (i.e., there must exist a single, global, total exe- ability to execute code in this manner is essential. Incution order over these actions that is consistent with a language that allows casts between pointers and in-the order in which the actions occur in their original tegers, or in a language without garbage collection,threads). Actions on volatile fields are always imme- any such guarantee is impossible. Even for code thatdiately visible to other threads, and do not need to is written by someone you trust not to act maliciously,be guarded by synchronization. safety guarantees are important: they limit the pos- If two threads access a normal variable, and one sible effects of an error.of those accesses is a write, then the program should Safety guarantees need to be enforced regardless of 2
  • 3. whether a program contains a synchronization error from the heap. If that memory was uninitialized be-or data race. fore allocation, an arbitrary value could be read. This In this section, we go over the implementation is- would obviously be a violation of Java semantics. Ifsues involved in enforcing certain virtual machine r2.x were a reference/pointer, then seeing a garbagesafety guarantees, and in the issues in writing li- value would violate type safety and make any kind ofbraries that promise higher level safety guarantees. security/safety guarantee impossible. One solution to this problem is allocate objects out of memory that all threads know to have been zeroed4.1 VM Safety guarantees (perhaps at GC time). This would mean that if we see an early/stale value for r2.x, we see a zero orConsider execution of the code on the left of Figure null value. This is type safe, and happens to be the1a on a multiprocessor with a weak memory model default value the field is initialized with before the(all of the ri variables are intended to be registers constructor is executed.that do not require memory references). Can this Now consider Figure 1c. When thread 2 dispatchesresult in r2 = -1? For this to happen, the write to p hashCode(), it needs to read the virtual method tablemust precede the read of p, and the read of *r1 must of the object referenced by r2. If we use the ideaprecede the write to y. suggested previously of allocating objects out of pre- It is easy to see how this could happen if the MemBar zeroed memory, then the repercussions of seeing a(Memory Barrier) instruction were not present. A stale value for the vptr are limited to a segmentationMemBar instruction usually requires that actions that fault when attempting to load a method address outhave been initiated are completed before any further of the virtual method table. Other operations suchactions can be taken. If a compiler or the processor as arraylength, instanceOf and checkCast could alsotries to reorder the statements in Thread 1 (leading to load header fields and behave anomalously.r2 = -1), then a MemBar would prevent that reorder- But consider what happens if the creation of theing. Given that the instructions in thread 1 cannot be Bar object by Thread 1 is the very first time Barreordered, you might think that the data dependence has been referenced. This forces the loading and ini-in thread 2 would prohibit seeing r2 = -1. You’d be tialization of class Bar. Then not only might threadwrong. The Alpha memory model allows the result 2 see a stale value in the instance of Bar, it couldr2 = -1. Existing implementations of the Alpha do also see a stale value in any of the data structures ornot actually reorder the instructions. However, some code loaded for class Bar. What makes this partic-Alpha processors can fulfill the r2 = *r1 instruction ularly tricky is that thread 2 has no indication thatout of a stale cache line, which has the same effect. it might be about to execute code of a class that hasFuture implementations may use value prediction to just been loaded.allow the instructions to be executed out of order. Stronger memory orders, such as TSO (Total StoreOrder), PSO (Partial Store Order) and RMO (Re- 4.1.1 Proposed VM Safety Guaranteeslaxed Memory Order) would not allow this reorder- Synchronization errors can only cause surprising oring. Sun’s SPARC chip typically runs in TSO mode, unexpected values to be returned from a read actionand Sun’s new MAJC chip implements RMO. Intel’s (i.e., a read of a field or array element). Other ac-IA-64 memory model does not allow r2 = -1; the tions, such as getting the length of an array, per-IA-32 has no memory barrier instructions or formal forming a checked cast or invoking a virtual methodmemory model (the implementation changes from behave normally. They cannot throw any exceptionschip to chip), but many knowledgeable experts have or errors because of a data race, cause the VM toclaimed that no IA-32 implementation would allow crash or be corrupted, or behave in any other waythe result r2=-1 (assuming an appropriate ordering not allowed by the semantics.instruction was used instead of the memory barrier). Values returned by read actions must be both type- Now consider Figure 1b. This is very similar to safe and “not out of thin air”. To say that a valueFigure 1a, except that y is replaced by heap allocated must be “not out of thin air” means that it must bememory for a new instance of Point. What happens a value written previously to that variable by someif, when Thread 2 reads Foo.p, it sees the address thread. For example, Figure 9 must not be able towritten by Thread 1, but it doesn’t see the writes produce any result other than i == j == 0; for ex-performed by Thread 1 to initialize the instance? ample, the value 42 cannot be assigned to i and j as if When thread 2 reads r2.x, it could see whatever by “magic”. The exception to this is that incorrectlywas in that memory location before it was allocated synchronized reads of non-volatile longs and doubles 3
  • 4. Initially Initially Initially p = &x; x = 1; y = -1 Foo.p = new Point(1,2) Foo.o = “Hello” Thread 1 Thread 2 Thread 1 Thread 2 Thread 1 Thread 2 y=2 r1 = p r1 = new Point(3,4) r2 = Foo.p r1 = new Bar(3,4) r2 = Foo.o MemBar r2 = *r1 MemBar r3 = r2.x MemBar r3 = r2.hashCode() p = &y Foo.p = r1 Foo.o = r1 Could result in Could result in Could result in r2 = -1 r3 = 0 or garbage almost anything (a) (b) (c) Figure 1: Surprising results from weak memory modelsare not required to respect the “not out of thin air” with weak memory models, such as an Alpha SMP.rule (see Section 8.8 for details). Under the existing semantics, the only way to pro- hibit this behavior is to make all of the methods and constructors of the String class synchronized.4.2 Library Safety guarantees This solution would incur a substantial performanceMany programmers assume that immutable objects penalty. The impact of this is compounded by the(objects that do not change once they are con- fact that the synchronization is not necessary on allstructed) do not need to be synchronized. This is only platforms, and even then is only required when thetrue for programs that are otherwise correctly syn- code contains a data race.chronized. However, if a reference to an immutable If an object contains mutable data fields, then syn-object is passed between threads without correct syn- chronization is required to protect the class againstchronization, then synchronization within the meth- attack via data race. For objects with immutableods of the object is needed to ensure that the object data fields, we propose allowing the class to be de-actually appears to be immutable. fended by use of final fields. The motivating example is the java.lang.Stringclass. This class is typically implemented using alength, offset, and reference to an array of characters. 5 Guarantees for Final fieldsAll of these are immutable (including the contents of Final fields must be assigned exactly once in the con-the array), although in existing implementations are structor for the class that defines them. The existingnot declared final. Java memory model contains no discussion of final The problem occurs if thread 1 creates a String ob- fields. In fact, at each synchronization point, finalject S, and then passes a reference to S to thread 2 fields need to be reloaded from memory just like nor-without using synchronization. When thread 2 reads mal fields.the fields of S, those reads are improperly synchro- We propose additional semantics for final fields.nized and can see the default values for the fields of These semantics will allow more aggressive optimiza-S. Later reads by thread 2 can then see the values set tions of final fields, and allow them to be used toby thread 1. guard against attack via data race. As an example of how this can affect a pro-gram, it is possible to show that a String that issupposed to be immutable can appear to change 5.1 When these semantics matterfrom “/tmp” to “/usr”. Consider an implementa- The semantics defined here are only significant fortion of StringBuffer whose substring method cre- programs that either:ates a string using the StringBuffer’s character ar-ray. It only creates a new array for the new • Allow objects to be made visible to other threadsString if the StringBuffer is changed. We cre- before the object is fully constructedate a String using new StringBuffer ("/usr/tmp") • Have data races.substring(4);. This will produce a string with anoffset field of 4 and a length of 4. If thread 2 in- We strongly recommend against allowing objects tocorrectly sees an offset with the default value of 0, escape during construction. Since this is simply ait will think the string represents “/usr” rather than matter of writing constructors correctly, it is not too“/tmp”. This behavior can only occur on systems difficult a task. While we also recommend against 4
  • 5. class ReloadFinal extends Thread { The first part of the semantics of final fields is: final int x; ReloadFinal() { F1 When a final field is read, the value read is the synchronized(this) { value assigned in the constructor. start(); sleep(10); Consider the scenario postulated at the bottom of x = 42; Figure 3. The question is: which of the variables i1 } - i7 are guaranteed to see the value 42? } public void run() { F1 alone guarantees that i1 is 42. However, that int i,j; rule isn’t sufficient to make Strings absolutely im- i = x; mutable. Strings contain a reference to an array of synchronized(this) { characters; the contents of that array must be seen j = x; to be immutable in order for the String to be im- } mutable. Unfortunately, there is no way to declare System.out.println(i + ", " + j); the contents of an array as final in Java. Even if // j must be 42, even if i is 0 you could, it would mean that you couldn’t reuse the } mutable character buffer from a StringBuffer in con- } structing a String. To use final fields to make Strings immutable re-Figure 2: Final fields must be reloaded under existing quires that when we read a final reference to an array,semantics we see both the correct reference to the array and the correct contents of the array. Enforcing this should guarantee that i2 is 42. For i3, the relevant ques-data races, defensive programming may require con- tion is: do the contents of the array need to be setsidering that a user of your code may deliberately in- before the final field is set (i.e, i3 might not be 42),troduce a data race, and that there is little or nothing or merely before the constructor completes (i3 mustyou can do to prevent it. be 42)? Although this point is debatable, we believe that5.2 Final fields of objects that escape a requirement for objects to be completely initialized their constructors before they are assigned to final fields would often be ignored or incorrectly performed. Thus, we recom-Figure 2 shows an example of where the existing spec- mend that the semantics only require that such ob-ification requires final fields to be reloaded. In this jects be initialized before the constructor completes.example, the object being constructed is made visi- Since i4 is very similar to i2, it should clearly beble to another thread before the final field is assigned. 42. What about i5? It is reading the same locationThat thread reads the final field, waits to be signaled as i4. However, simple compiler optimizations wouldthat the constructor has assigned the final field, and simply reuse the value loaded for j as the value of i5.then reads the final field again. The current specifi- Similarly, a processor using the Sparc RMO memorycation guarantees that even if the first read of tmp1.x model would only require a memory barrier at thein foo sees 0, the second read will see 42. end of the constructor to guarantee that i4 is 42. The (informal) rule for final fields is that you must However, ensuring that i5 is 42 under RMO wouldensure that the constructor for a object has com- require a memory barrier by the reading thread. Forpleted before another thread is allowed to load a ref- these reasons, we recommend that the semantics noterence to that object. These are called “properly con- require that i5 be 42.structed” final fields. We will deal with the seman- All of the examples to this point have dealt withtics of properly constructed final fields first, and then references to arrays. However, it would be very con-come to the semantics of improperly constructed final fusing if these semantics applied only to array ele-fields. ments and not to object fields. Thus, the semantics should require that i6 is 42.5.3 Informal semantics of final fields We need to decide if these special semantics ap- ply only to the fields/elements of the object/arrayThe formal detailed semantics for final fields are given directly referenced, or if it applies to those referencedin Section 8.7. For now, we just describe the informal indirectly. If the semantics apply to indirectly refer-semantics of final fields that are constructed properly. enced fields/elements, then i7 must be 42. We be- 5
  • 6. class FinalTest { public static FinalTest ft; public static int [] x = new int[1]; static void foo() { public final int a; int [] myX = FinalTest.x; public final int [] b,c,d; int j = myX[0]; public final Point p; FinalTest f1 = ft; public final int [][] e; if (f1 == null) return; // Guaranteed to see value public FinalTest(int i) { // set in constructor? int i1 = f1.a; // yes a = i; int i2 = f1.b[0]; // yes int i3 = f1.c[0]; // yes int [] tmp = new int[1]; int i4 = f1.d[0]; // yes tmp[0] = i; int i5 = myX[0]; // no b = tmp; int i6 = f1.p.x; // yes int i7 = f1.e[0][0]; // yes c = new int[1]; // use j, i1 ... i7 c[0] = i; } } FinalTest.x[0] = i; d = FinalTest.x; // Thread 1: p = new Point(); // FinalTest.ft = new FinalTest(42); p.x = i; // Thread 2; e = new int[1][1]; // FinalTest.foo(); e[0][0] = i; } Figure 3: Subtle points of the revised semantics of final 6
  • 7. lieve making the semantics apply only to directly ref- If i1 is not null, and we then try to read i1.x, shoulderenced fields would be difficult to program correctly, we be forced to see the correctly constructed value ofso we recommend that i7 be required to be 42. 42? After all, the write to improper occurred after To formalize this idea, we say that a read r2 is the write of 42; one line of reasoning would suggestderived from a read r1 if that if you can see the write to improper, you should be able to see the write to improper.x. This is not • r2 is a read of a field or element of an address the case, however. The write to improper can be re- that was returned by r1, or ordered to before the write to improper.x. Therefore, • there exists a read r3 such that r3 is derived from i2 can have either the value 42 or the value 0. r1 and r2 is derived from r3. Because we have guaranteed that p will not be null, the reads from p should return the correctly con- Thus, the additional semantics for final fields are: structed values for the fields. This is discussed in section 5.3.F2 Assume thread T1 assigns a value to a final field Now we come to i3 and i4. It is not unreasonable, f of object X defined in class C. Assume that initially, to believe that i3 and i4 should have the cor- T1 does not allow any other thread to load a rect values in them. After all, we have just ensured reference to X until after the C constructor for that the thread has seen that object; it has been refer- X has terminated. Thread T2 then reads field enced through p. However, the compiler could reuse f of X. Any writes done by T1 before the class the values of i1 and i2 for i3 and i4 through common C constructor for object X terminates are guar- subexpression elimination. The values for i3 and i4 anteed to be ordered before and visible to any must therefore remain the same as those of i1 and i2. reads done by T2 that are derived from the read of f. 5.5 Final Static Fields5.4 Improperly Constructed Final Fields Final static fields must be initialized by the class ini- tializer for the class in which they are defined. TheConditions [F1] and [F2] suffice if the object which semantics for class initialization guarantee that anycontains the final field is not made visible to another thread that reads a static field sees all the results ofthread before its constructor ends. Additional seman- the execution of the class initialization.tics are needed to describe the behavior of a program Note that final static fields do not have to bethat allows references to objects to escape their con- reloaded at synchronization points.structor. Under certain complicated circumstances involving The basic question of what should be read from a circularities in class initialization, it is possible for afinal field which is improperly constructed is a simple thread to access the static variables of a class beforeone. In order to maintain not-out-of-thin-air safety, the static initializer for that class has started. Underit is necessary that the value read out of such a final such situations, a thread which accesses a final staticfield is either the default value for its type, or the field before it has been set sees the default value forvalue written to it in its constructor. the field. This does not otherwise affect the nature Figure 4 demonstrates some of the issues with or property of the field (any other threads that readimproperly synchronized final fields. The variables the static field will see the final value set in the classproper and improper refer to the same object. proper initializer). No special semantics or memory barrierspoints to the correctly constructed version of the ob- are required to observe this behavior; the standardject, because the reference was written to it after the memory barriers required for class initialization en-constructor completed. improper is not guaranteed sure it.to point to the correctly constructed version of theobject, because it was set before the object was fullyconstructed. 5.6 Native code changing final fields When thread 1 reads the improperly constructedreference into i, and tries to reference i.x through JNI allows native code to change final fields. To allowthat reference, we cannot make the guarantee that optimization (and sane understanding) of final fields,the constructor has finished. The resulting value of that ability will be prohibited. Attempting to usei1 may be either a reference to the point or the default JNI to change a final field should throw an immediatevalue for that field (which is null). exception. 7
  • 8. static void foo() { Improper p = proper; Improper i = improper; if (p == null) return; // Possible Resultsclass Improper { public final Point p; Improper i1 = i; // reference to point or null public static Improper proper; int i2 = i.x; // 42 or 0 public static Improper improper; Improper p1 = p; // reference to point public Improper(int i) { int p2 = p.x; // 42 p = new Point(); Improper i3 = i; // reference to point or null p.x = i; int i4 = i.x; // 42 or 0 improper = this; } } } // Thread 1: // Improper.proper = new Improper(42); // Thread 2; // Improper.foo(); Figure 4: Improperly Constructed Final Fields 8
  • 9. 5.6.1 Write Protected Fields class FinalizerTest { static int x = 0;System.in, System.out, and System.err are final int y = 0;static fields that are changed by the methods System. static int z = 0;setIn, System.setOut and System.setErr. This isdone by having the methods call native code that protected void finalize() {modifies the final fields. We need to create a special int i = FinalizerTest.x;rule to handle this situation. int j = y; These fields should have been accessed via getter int k = FinalizerTest.z;methods (e.g., System.getIn()). However, it would // use i, j and kbe impossible to make that change now. If we sim- }ply made the fields non-final, then untrusted codecould change the fields, which would also be a serious public static void foo() {problem (functions such as System.setIn have to get FinalizerTest ft = new FinalizerTest();permission from the security manager). FinalizerTest.x = 1; The (ugly) solution for this is to create a new kind ft.y = 1;of field, write protected, and declare these three fields FinalizerTest.z = 1;(and only these fields) as write protected. They ft = null;would be treated as normal variables, except that }the JVM would reject any bytecode that attempts to }modify them. In particular, they need to be reloadedat synchronization points. Figure 5: Subtle issues involving finalization6 Guarantees for Finalizers by ft is clearly reachable at least until the assign- ment to ft.y is performed.When an object is no longer reachable, the So the guarantee that can be reasonably made isfinalize() method (i.e., the finalizer) for the ob- that all memory accesses to the fields of an object Xject may be invoked. The finalizer is typically run during normal execution are ordered before all mem-in a separate finalizer thread, although there may be ory accesses to the fields of X performed during themore than one such thread. invocation of the finalizer for X. Furthermore, all The loss of the last reference to an object acts as memory accesses visible to the constructing thread atan asynchronous signal to another thread to invoke the time it completes the construction of X are visi-the finalizer. In many cases, finalizers should be syn- ble to the finalizer for X. For a uniprocessor garbagechronized, because the finalizers of an unreachable collector, or a multiprocessor garbage collector thatbut connected set of objects can be invoked simul- performs a global memory barrier (a memory barriertaneously by different threads. However, in practice on all processors) as part of garbage collection, thisfinalizers are often not synchronized. To na¨ users, ıve guarantee should be free.it seems counter-intuitive to synchronize finalizers. Why is it hard to make guarantees? Consider the For a garbage collector that doesn’t “stop thecode in Figure 5. If foo() is invoked, an object is world”, things are a little trickier. When an objectcreated and then made unreachable. What is guar- with a finalizer becomes unreachable, it must be putanteed about the reads in the finalizer? into special queue of unreachable objects. The next time a global memory barrier is performed, all of the An aggressive compiler and garbage collector may objects in the unreachable queue get moved to a fi-realize that after the assignment to ft.y, all ref- nalizable queue, and it now becomes safe to run theirerences to the object are dead and thus the ob- finalizer. There are a number of situations that willject is unreachable. If garbage collection and fi- cause global memory barriers (such as class initial-nalization were performed immediately, the write to ization), and they can also be performed periodicallyFinalizerTest.z would not have been performed or when the queue of unreachable objects grows tooand would not be visible. large. But if the compiler reorders the assignments toFinalizerTest.x and ft.y, the same would hold forFinalizerTest.x. However, the object referenced 9
  • 10. Thread 1: while (true) operations. Within each thread, operations are usu- Thread 2: ally done in their original order. The exception is that synchronized (o) synchronized (o) { { writes and stores may be done presciently, i.e., exe- // does not call cuted early (§8.5.1). Even without prescient writes, // does nothing. // Thread.yield(), // Thread.sleep() } the process that decides what value is seen by a read } is complicated and nondeterministic; the end result is not sequential consistency. Figure 6: Fairness 8.1 Operations7 Fairness Guarantees An operation corresponds to one JVM opcode. A getfield, getstatic or array load opcode correspondsWithout a fairness guarantee for virtual machines, to a Read. A putfield, putstatic or array store op-it is possible for a running thread to be capable of code corresponds to a Write. A monitorenter opcodemaking progress and never do so. Java currently has corresponds to a Lock, and a monitorexit opcode cor-no official fairness guarantee, although, in practice, responds to an Unlock.most JVMs do provide it to some extent. An exampleof a potential weak fairness guarantee would be onethat states that if a thread is infinitely often allowed 8.2 Simple Semantics, excluding Finalto make progress, it would eventually do so. Fields and Prescient Writes An example of how this issue can impact a programcan be seen in Figure 6. Without a fairness guarantee, Establishing adequate rules for final fields and pre-it is perfectly legal for a compiler to move the while scient writes is difficult, and substantially complicatesloop inside the synchronized block; Thread 2 will the semantics. We will first present a version of thebe blocked forever. semantics that does not allow for either of these. Any potential fairness guarantee would be inextri-cably linked to the threading model for a given vir- 8.2.1 Types and Domainstual machine. A threading model that only switchesthreads when Thread.yield() is called will never al- value A primitive value (e.g., int) or a reference tolow Thread 2 to execute. A fairness guarantee would a object.make this sort of implementation, which is used in anumber of JVMs, illegal; it would force Thread 2 to variable Static variable of a loaded class, a field ofbe scheduled. Because this kind of implementation is an allocated object, or element of an allocatedoften desirable, our proposed specification does not array.include a fairness guarantee. GUID A globally unique identifier assigned to each The flip side of this issue is the fact that library dynamic occurrence of write. This allows, forcalls like Thread.yield() and Thread.sleep() are example, two writes of 42 to a variable v to begiven no meaningful semantics by the Java API. The distinguished.question of whether they should have one is outsidethe scope of this discussion, which centers on VM write A tuple of a variable, a value (the value writ-issues, not API changes. ten to the variable), and a GUID (to distinguish this write from other writes of the same value to the same variable).8 Formal SpecificationThe following is a formal, operational semantics for 8.3 Simple Semanticsmultithreaded Java. It isn’t intended to be a methodanybody would use to implement Java. A JVM im- There is a set allWrites that denotes the set of allplementation is legal iff for any execution observed on writes performed by any thread to any variable. Forthe JVM, there is a execution under these semantics any set S of writes, S(v) ⊆ S is the set of writes to vthat is observationally equivalent. in S. The model is a global system that atomically ex- For each thread t, at any given step, overwrittent isecutes one operation from one thread in each step. the set of writes that thread t knows are overwrittenThis creates a total order over the execution of all and previoust is the set of all writes that thread t 10
  • 11. knows occurred previously. It is an invariant that for 1. Associated with each thread T1 is a hiddenall t, volatile start field. When thread T2 starts T1, it is as though T2 writes to the start field, and overwrittent ⊂ previoust ⊆ allWrites the very first action taken by T1 is to read that field. Furthermore, all of these sets are monotonic: theycan only grow. 2. When a thread T1 terminates, as its very last When each variable v is created, there is a write action it writes to a hidden volatile terminatedw of the default value to v s.t. allWrites(v) = {w} field. Any action that allows a thread T2 to de-and for all t, overwrittent (v) = {} and previoust (v) = tect that T1 has terminated is treated as a read{w}. of this field. These actions include: When thread t reads a variable v, the value re-turned is that of an arbitrary write from the set • Calling join() on T1 and having it return due to thread termination. allWrites(v) − overwrittent • Calling isAlive() on T1 and having it return This is the set of writes that are eligible to be false because T1 has terminated.read by thread t for variable v. Every monitor and • Being in a shutdownHook thread after ter-volatile variable x has an associated overwrittenx and mination of T1, where T1 is a non-daemonpreviousx set. Synchronization actions cause infor- thread that terminated before virtual ma-mation to be exchanged between a thread’s previous chine shutdown was initiated.and overwritten sets and those of a monitor or 3. When thread T2 interrupts or stops T1, it is asvolatile. For example, when thread t locks mon- though T2 writes to a hidden volatile interrupteditor m, it performs previoust ∪ = previousm and field of T1, that is read by T1 when it detects oroverwrittent ∪ = overwrittenm . The semantics of receives the interrupt/threadDeath.Read, Write, Lock and Unlock actions are given inFigure 7. 4. After a thread T1 initializes a class C, but be- If your program is properly synchronized, then fore releasing the lock on C, it writes “true” towhenever thread t reads or writes a variable v, you a hidden volatile static field initialized of C.must have done synchronization in a way that ensures If another thread T2 needs to check that C hasthat all previous writes of that variable are known to been initialized, it can just check that the ini-be in previoust . In other words, tialized field has been set to true (which would previoust (v) = allWrites(v) be a read of the volatile field). T2 does not need to obtain a lock on the class object for C if it From that, you can do an induction proof that ini- detects that C is already initialized.tially and before and after thread t reads or writes avariable v, 8.5 Semantics with Prescient Writes | allWrites(v) − overwrittent |= 1 In this section, we add prescient writes to our seman- Thus, the value of v read by thread t is always the tics.most recent write of v: allWrites(v) − overwrittent .In a correctly synchronized program, there will there- 8.5.1 Need for Prescient Writesfore only be one eligible value for any variable in anythread at a given time. This results in sequential Consider the example in Figure 8. If the actionsconsistency. must be executed in their original order, then one of the reads must happen first, making it impossible to get the result i == j == 1. However, a com-8.4 Explicit Thread Communication piler might decide to reorder the statements in eachStarting, interrupting or detecting that a thread has thread, which would allow this result.terminated all have special synchronization seman- In order to allow standard compiler optimizationstics, as does initializing a class. Although we could to be performed, we need to allow Prescient Writes.add special rules to Figure 7 for these operations, it A compiler may move a write earlier than it wouldis easier to describe them in terms of the semantics be executed by the original program if the followingof hidden volatile fields. conditions are absolutely guaranteed: 11
  • 12. writeNormal(Write v, w, g ) overwrittent ∪ = previoust (v) previoust + = v, w, g allWrites+ = v, w, g Initially: readNormal(Variable v) a = b = 0 Choose v, w, g from Thread 1: Thread 2: allWrites(v) − overwrittent j = b; i = a; return w a = 1; b = 1; lock(Monitor m) Can this result in i == j == 1? Acquire/increment lock on m previoust ∪ = previousm ; Figure 8: Motivation for Prescient Writes overwrittent ∪ = overwrittenm ; unlock(Monitor m) previousm ∪ = previoust ; overwrittenm ∪ = overwrittent ; Release/decrement lock on m readVolatile(Variable v) previoust ∪ = previousv ; Initially: overwrittent ∪ = overwrittenv ; a = 0 return volatileValuev Thread 1: Thread 2: j = a; writeVolatile(Write v, w, g ) i = a; a = j; volatileValuev = w a = i; previousv ∪ = previoust ; Must not result in i == j == 42 overwrittenv ∪ = overwrittent ; Figure 9: Prescient Writes must be GuaranteedFigure 7: Formal semantics without final fields orprescient writes 1. The write will happen (with the variable and value written guaranteed as well). 2. The prescient write can not be seen in the same thread before the write would normally occur. Initially: a = b = c = 0 3. Any premature reads of the prescient write must Thread 1: not be observable as a previousRead via synchro- i = a; Thread 2: nization. j = a; k = b; if (i == j) a = k; When we say that something is guaranteed, this b = 2;includes the fact that it must be guaranteed over allpossible results from improperly synchronized reads Can i == j == k == 2?(which are non-deterministic, because |allWrites(v) −overW ritest | > 1). Figure 9 shows an example of a Figure 10: Motivation for guaranteedRedun-behavior that could be considered “consistent” (in dantReada very perverted sense) if prescient writes were notrequired to be guaranteed across non-deterministicreads (the value of 42 appears out of thin air in thisexample). 12
  • 13. Initially: x == 0. x = y = 0 Thread 1: x = 0; Thread 2: 8.5.4 Overview if (x == 0) x = y; The semantics of each of the actions are given in Fig- y = 2; ure 12. The write actions take one parameter: the write to be performed. The read actions take two pa- Can x == 0, y == 2? rameters: a local that references an object to be read, and an element of that object (field or array element). Figure 11: Motivation for guaranteedReadOfWrite The lock and unlock actions take one parameter: the monitor to be locked or unlocked. We use8.5.2 Need for GuaranteedRedundantRead infox ∪ = infoyThe need for the guaranteedRedundantRead action as shorthand forstems from the use of prescient writes. Consider theexample in Figure 10. It would be perfectly reason- previousReadsx ∪ = previousReadsyable for a compiler to determine that the if test in previousx ∪ = previousyThread 1 will always evaluate to true, and then elim- overwrittenx ∪ = overwrittenyinate it. The compiler could then perform the writeto b in Thread 1 early; the result of this code couldbe i = j = k = 2. 8.5.5 Static variables For this result to be possible in the semantics, how- Before any reference to a static variable, the threadever, a prescient write of 2 to y must occur at the must insure that the class is initialized.beginning of thread 1. However, i and j can readdifferent values from a. This may cause the i == jtest to fail; the actual write to b might not occur. To 8.5.6 Semantics of Prescient writeshave a prescient write in this case is not allowed by Each write action is broken into two parts: initWritethe semantics described in Section 8.5.1. and performWrite. The performWrite is always per- The solution to this problem is to introduce guar- formed at the point where the write existed in theanteed reads for i and j. If we guarantee that i and original program. Each performWrite has a corre-j will read the same value from a, then the if condi- sponding initWrite that occurs before it and is per-tion will always be true. This removes the restriction formed on a write tuple with the same GUID. Thefrom performing a prescient write of b = 2; that is initWrite can always be performed immediately be-in place if b = 2; is not executed. fore the performWrite. The initWrite may be per- A guaranteedRedundantRead is simply a read that formed prior to that (i.e., presciently) if the writeprovides the assurance that the GUID read will be the is guaranteed to occur. This guarantee extends oversame as another guaranteedRedundantRead’s GUID. non-deterministic choices for the values of reads.This allows the model to circumvent the restrictions We must guarantee that no properly synchronizedof prescient writes when necessary. read of the variable being written can be observed between the prescient write and the execution of the8.5.3 Need for GuaranteedReadOfWrite write by the original program. To accomplish this, we create a set previousReads(t) for every thread tThe guaranteedReadOfWrite action is quite similar which contains the set of values of variables that tto the guaranteedRedundantRead action. In this knows have been read. A read can be added to thiscase, however, a read is guaranteed to see a particular set in two ways: if t performed the read, or t haswrite’s GUID. synchronized with a thread that contained the read Consider Figure 11. We wish to have the result in its previousReads(t) set.x == 0, y == 2. To do this we need a prescient If a properly synchronized read of the variable werewrite of y = 2. Under the rules for prescient writes, to occur between the initWrite and the performWrite,this cannot be done unless the condition of the if the read would be placed in the previousReads set ofstatement is guaranteed to evaluate to true. This the thread performing the write. We assert that thisis accomplished by changing the read of x in the if cannot happen; this maintains the necessary condi-statement to a guaranteedReadOfWrite of the write tions for prescient writes. 13
  • 14. The set uncommittedt contains the set of pre- sciently performed writes by a thread whose per- initWrite(Write v, w, g ) formWrite action has not occurred. Writes contained allWrites+ = v, w, g in a thread’s uncommittedt set are invisible to that uncommittedt + = v, w, g thread. This set exists to reinforce the fact that the performWrite(Write v, w, g ) prescient write is invisible to the thread that executed Assert v, w, g ∈ previousReadst it until the performWrite action. This would be han- overwrittent ∪ = previoust (v) dled by the assertion in performWrite, but making it previoust + = v, w, g clear that this is not a choice clarifies what it means uncommittedt − = v, w, g for a prescient write to be guaranteed. Guaranteed Reads are simply ordinary reads, the readNormal(Variable v) results of which are determined by the GUID they Choose v, w, g from allWrites(v) take as input. −uncommittedt − overwrittent previousReadst + = v, w, g return w 8.5.7 Prescient Reads? The semantics we have described does not need any guaranteedReadOfWrite(Variable v, GUID g) Assert ∃ v, w, g ∈ previoust explicit form of prescient reads to reflect ordering that −uncommittedt − overwrittent might be done by a compiler or processor. The effects previousReadst + = v, w, g of prescient reads are produced by other parts of the return w semantics. If a Read action were done early, the set of values guaranteedRedundantRead(Variable v, GUID that could be returned by the read would just be a g) subset of the values that could be done at the original Let v, w, g be the write seen by g location of the Read. So the fact that a compiler or Assert v, w, g ∈ previousReadst processor might perform a read early, or fulfill a read −uncommittedt − overwrittent out of a local cache, cannot be detected and is allowed return w by the semantics, without any explicit provisions for readStatic(Variable v) prescient reads. Choose v, w, g from allWrites(v) −uncommittedt − overwrittent 8.5.8 Other reorderings previousReadst + = v, w, g return w The legality of many other compiler reorderings can be inferred from the semantics. These compiler re- lock(Monitor m) orderings could include speculative reads or the delay Acquire/increment lock on m of a memory reference. For example, in the absence infot ∪ = infom ; of synchronization operations, constructors and final fields, all memory references can be freely reordered unlock(Monitor m) subject to the usual constraints arising in transform- infom ∪ = infot ; ing single-threaded code (e.g., you can’t reorder two Release/decrement lock on m writes to the same variable). readVolatile(Variable v) infot ∪ = infov 8.6 Non-Atomic Volatiles return volatileValuev In this section, we describe why volatile variables writeVolatile(Write v, w, g ) must execute in more than one stage; we call this volatileValuev = w a non-atomic write to a volatile. infov ∪ = infot ; 8.6.1 Need for Non-Atomic VolatilesFigure 12: Semantics of Program Actions Without The example in Figure 13 gives a motivation for non-Final Fields atomic volatile writes. Consider a processor architec- ture which allows writes by one processor to become visible to different processors in different orders. 14
  • 15. Initially: a=b=0 a, b are volatile Thread 1 Thread 2 Thread 3 Thread 4 a = 1; int u = 0, v = 0; b = 1; int w = 0, x = 0; u = b; w = a; v = a; x = b; Figure 13: Can u == w == 0, v == x == 1? Each thread in our example executes on a different readVolatile(Local a, oF, kF , Element e)processor. Thread 3’s update to b may become visible Let v be the volatile referenced by a.e if (uncommittedVolatileValuev = n/a) orto Thread 4 before Thread 1’s update to a. This (readThisVolatilet, w,infot = false)would result in w == 0, x == 1. However, Thread infot ∪ = infov1’s update to a may become visible to Thread 2 before return volatileValuev , kF, oFThread 3’s update to b. This would result in u == else0, v == 1. w, infou = uncommittedVolatileValuev The simple semantics enforce a total order over all volatileValuev = wvolatile writes. This means that each thread must infov ∪ = infousee accesses to every volatile variable in the order inwhich they were written. If this restriction is relaxed initVolatileWrite(Write v, w, g )so that there is only a total order over writes to indi- Assert uncommittedVolatileValuev = n/a ∀t ∈ threads :vidual volatile variables, then the above situation is readThisVolatilet, w,infot = falsefixed. uncommittedVolatileValuev = w, infot So the design principle is simple: if two threads per-form volatile writes to two different variables, then performVolatileWrite(Write v, w, g )any threads reading those variables can read the uncommittedVolatileValuev = n/awrites in any order. We still want to enforce a to- volatileValuev = wtal order over writes to the same variable, though; if infov ∪ = infottwo threads perform volatile writes to the same vari-able, they are guaranteed to be seen in a total orderby reading threads. Figure 14: Semantics for Non-Atomic Volatiles8.6.2 Semantics of Non-Atomic Volatiles blocked until the first thread performs the perfor- mVolatileWrite.To accomplish these goals, the semantics splits The semantics for non-atomic volatile accesses canvolatile writes into two actions: initVolatileWrite and be seen in Figure 14.performVolatileWrite. Each write to a volatile vari-able in the original code is represented by this two-stage instruction. The performVolatileWrite must be 8.7 Full Semanticsimmediately preceded in the thread in which it occurs In this section, we add semantics for final fields, asby the initVolatileWrite for that write. There can be discussed in section 5. The addition of final fieldsno intervening instructions. completes the semantics. After an initVolatileWrite, other threads can seeeither the value that it wrote to the volatile, or the 8.7.1 New Types and Domainsoriginal value. Once a thread sees the new valueof a partially completed volatile write, that thread local A value stored in a stack location or local (e.g.,can no longer see the old value. When the perfor- not in a field or array element). A local is repre-mVolatileWrite occurs, only the new value is visi- sented by a tuple a, oF, kF , where a is a valueble. If one thread performs an initVolatileWrite of (a reference to an object or a primitive value),a volatile variable, any other thread that attempts oF is a set of writes known to be overwrittento perform an initVolatileWrite of that variable is and kF is a set of writes to final fields known to 15
  • 16. have been frozen. oF and kF exist because of the to be frozen. When a final field is frozen, it is added special semantics of final fields. to the knownFrozen set of the thread. A reference to an object consists of two things: the actual reference,8.7.2 Freezing final fields and a knownFrozen set. When a reference r, kF is written to a variable v, v gets r, kF ∪knownFrozent ,When a constructor terminates normally, the thread where knownFrozent is the knownFrozen set for thatperforms freeze actions on all final fields defined in thread.that class. If a constructor A1 for A chains to another When a heap variable is read into a local,constructor A2 for A, the fields are only frozen at the that reference’s knownFrozen set and the thread’scompletion of A1. If a constructor B1 for B chains to knownFrozen set are combined into a knownFrozena constructor A1 for A (a superclass of B), then upon set for that local.completion of A1, final fields declared in A are frozen, If that heap variable was written before a final fieldand upon completion of B1, final fields declared in B f was frozen (the end of f ’s constructor), and thereare frozen. has been no intervening synchronization to commu- Associated with each final variable v are nicate the knownFrozen set from the thread that ini- • finalValuev (the value of v) tialized f to the thread that is now reading it, then the local will not contain f in its knownFrozen set. • overwrittenv (the write known to be overwritten If an attempt is then made to read a final field a.f , by reading v) where a is a local f will be read as a pseudo-final field. Every read of any field is performed through a lo- If that reference was written after f was frozen, orcal a, oF, kF . A read done in this way cannot re- there has been intervening synchronization to com-turn any of the writes in the set oF due to the spe- municate the knownFrozen set from the thread thatcial semantics of final fields. For each final field v, initialized f to the thread that is now reading it, thenoverwrittenv is the overwrittent set of the thread that the local will contain f in its knownFrozen set. Anyperformed the freeze on v, at the time that the freeze attempt to read a.f will therefore see the correctlywas performed. overwrittenv is assigned when the constructed version.freeze on v is performed. Whenever a read of a final A read of a pseudo-final field non-deterministicallyfield v is performed, the tuple returned contains the returns either the default value for the type of thatvalue of v and the union of overwrittenv with the lo- field, or the value written to that field in the con-cal’s oF set. The effect of this is that the writes in structor (if that write has occurred).overwrittenv cannot be returned by any read derived Furthermore, if a final field is pseudo-final, it doesfrom a read of v (condition F2). not communicate any information about overwritten The this parameter to the run method of a thread fields (as described in Section 8.7.2). No guaranteehas an empty oF set, as done the local generated by is made that objects accessed through that final fielda NEW operation. will be correctly constructed. Objects can have multiple constructors (e.g., if8.7.3 Pseudo-final fields class B extends A, then a B object has a B con-If a reference to an object with a final field is loaded structor and an A constructor). In such a case, ifby a thread that did not construct that object, one a B object becomes visible to other threads after theof two things should be true: A constructor has terminated, but before the B con- structor has terminated, then the final fields defined • That reference was written after the appropriate in B become pseudo-final, but the final fields of A constructor terminated, or remain final. • synchronization is used to guarantee that the ref- erence could not be loaded until after the appro- Final fields and Prescient writes An initWrite priate constructor terminated. of a reference a must not be reordered with an earlier freeze of a field of the object o referenced by a. ThisThe need to detect this is handled by the prevents a prescient write from allowing a referenceknownFrozen sets. to o to escape the thread before o’s final fields have Each thread, monitor, volatile and reference been frozen.(stored either in a heap variable or in a local) hasa corresponding set knownFrozen of fields it knows 16
  • 17. 8.7.4 Overview Thread 1: Thread 2: synchronized ( synchronized (The final version of the semantics closely resembles new Object()) { new Object()) {the one in Figure 12. The freeze actions take one x = 1; y = 1;parameter: the final variable to be frozen. } } synchronized ( synchronized ( We use new Object()) { new Object()) { infox ∪ = infoy j = y; i = x;as shorthand for } } previousReadsx ∪= previousReadsy Figure 17: “Useless” synchronization previousx ∪= previousy overwrittenx ∪= overwritteny It is strongly recommended that objects with non- knownFrozenx ∪= knownFrozeny trivial finalizers be synchronized. The semantics given here for unsynchronized finalization are very8.7.5 Static Variables weak, but it isn’t clear that a stronger semantics could be enforced.Because of the semantics of class initialization, nospecial final semantics are needed for static variables. 8.10 Related Work8.8 Non-atomic longs and doubles The simple semantics is closely related to LocationA read of a long or double variable v can return a Consistency [GS98]; the major difference is that incombination of the first and second half of any two location consistency, an acquire or release affects onlyof the eligible values for v. If access to v is properly a single memory location. However, location consis-synchronized, then there will only be one write in tency is more of an architectural level memory model,the set of eligible values for v. In this case, the new and does not directly support abstractions such asvalue of v will not be a combination of two or more monitors, final fields or finalizers. Also, location con-values (more precisely, it will be a combination of the sistency allows actions to be reordered “in ways thatfirst half and the second half of the same value). The respect dependencies”. We feel that our rules forspecification for reads of longs and doubles is shown prescient writes are more precise, particularly within Figure 16. The way in which these values might be regard to compiler transformations.combined is implementation dependent. This allows To underscore the similarity to Location Consis-machines that do not have efficient 64-bit load/store tency, the previoust (v) can be seen to be the sameinstructions to implement loads/stores of longs and as the set {e | t ∈ processorset(e)} and everythingdoubles as two 32-bit load/stores. reachable from that set by following edges backwards Note that reads and writes of volatile and final long in the poset for v. Furthermore, the MRPW set isand double variables are required to be atomic. equal to previoust (v) − overwrittent .8.9 Finalizers 9 OptimizationsFinalizers are executed in an arbitrary thread t thatholds no locks at the time the finalizer begins execu- A number of papers [WR99, ACS99, BH99, Bla99,tion. For a finalizer on an object o, overwrittent is the CGS+ 99] have looked at determining when synchro-union of all writes to any field/element of o known to nization in Java programs is “useless”, and removingbe overwritten by any thread at the time o is deter- the synchronization. A “useless” synchronization ismined to be unreachable, along with the overwritten one whose effects cannot be observed. For example,set of the thread that constructed o as of the moment synchronization on thread-local objects is “useless.”the constructor terminated. The set previoust is the The existing Java thread semantics [GJS96, §17]union of all writes to any field/element of o known does not allow for complete removal of “useless” syn-to be previous by any thread at the time o is deter- chronization. For example, in Figure 17, the existingmined to be unreachable, along with the previous set semantics make it illegal to see 0 in both i and j,of the thread that constructed o as of the moment while under these proposed semantics, this outcomethe constructor terminated. would be legal. It is hard to imagine any reasonable 17
  • 18. readStatic(Variable v) Choose v, w, g from allWrites(v) −uncommittedt − overwrittent previousReadst + = v, w, g r, kF = updateReference(w, knownFrozent ) return r, ∅, kFupdateReference(Value w, knownFrozen kf ) if w is primitive, return w lock(Monitor m) let [r, k] = w Acquire/increment lock on m return [r, k ∪ kF] infot ∪ = infom ;initWrite(Write v, w, g ) unlock(Monitor m) w = updateReference (w, knownFrozent ) infom ∪ = infot ; allWrites+ = v, w , g Release/decrement lock on m uncommittedt + = v, w , g readVolatile(Local a, oF, kF , Element e)performWrite(Write v, w, g ) Let v be the volatile referenced by a.e w = updateReference (w, knownFrozent ) if (uncommittedVolatileValuev = n/a) or Assert v, w , g ∈ previousReadst (readThisVolatilet, w,infot = false) overwrittent ∪ = previoust (v) infot ∪ = infov previoust + = v, w , g return volatileValuev , kF, oF uncommittedt − = v, w , g else w, infou = uncommittedVolatileValuevreadNormal(Local a, oF, kF , Element e) volatileValuev = w Let v be the variable referenced by a.e infov ∪ = infou Choose v, w, g from allWrites(v) − oF −uncommittedt − overwrittent initVolatileWrite(Write v, w, g ) previousReadst + = v, w, g Assert uncommittedVolatileValuev = n/a r, kF = updateReference(w, knownFrozent ) ∀t ∈ threads : return r, kF , oF readThisVolatilet, w,infot = falseguaranteedReadOfWrite(Value a, oF, kF , Element uncommittedVolatileValuev = w, infot e, GUID g) Let v be the variable referenced by a.e performVolatileWrite(Write v, w, g ) Assert ∃ v, w, g ∈ previoust uncommittedVolatileValuev = n/a −uncommittedt − overwrittent volatileValuev = w previousReadst + = v, w, g infov ∪ = infot r, kF = updateReference(w, knownFrozent ) return r, kF , oF writeFinal(Write v, w, g ) finalValuev = wguaranteedRedundantRead(Value a, oF, kF , Ele- ment e, GUID g) freezeFinal(Variable v) Let v be the variable referenced by a.e overwrittenv = overwrittent Let v, w, g be the write seen by g knownFrozent + = v Assert v, w, g ∈ previousReadst −uncommittedt − overwrittent readFinal(Local a, oF, kF , Element e) r, kF = updateReference(w, knownFrozent ) Let v be the final variable referenced by a.e return r, kF , oF if v ∈ kF oF = overwrittenv return finalValuev , kF, oF ∪ overwrittenv else w =either finalValuev or def aultV aluev return w, kF, oF Figure 15: Full Semantics of Program Actions 18
  • 19. readNormalLongOrDouble(Value a, oF , element e) Let v be the variable referenced by a.e Let v and v be arbitrary values from allWrites(v) − overwrittent − uncommittedt − oF return combine(firstPart(v ), secondPart(v )), kFv ∪ kFv , oF, Figure 16: Formal semantics for longs and doublesprogramming style that depends on the ordering con- Initially:straints arising from this kind of “useless” synchro- p.next = nullnization. The semantics we have proposed make a number Thread 1:of synchronization optimizations legal, including: p.next = p 1. Complete elimination of lock and unlock opera- Thread 2: tions on a monitor unless more than one thread List tmp = p.next; performs lock/unlock operations on that moni- if (tmp == p tor. Since no other thread will see the informa- && tmp.next == null) { tion associated with the monitor, the operations // Can’t happen under CRF have no effect. } 2. Complete elimination of reentrant lock/unlock operations (e.g., when a synchronized method Figure 18: CRF is constrained by data dependences calls another synchronized method on the same object). Since no other thread can touch the in- Initially: formation associated with the monitor while the a = 0 outer lock is in effect, any inner lock/unlock ac- Thread 1: Thread 2: tions have no effect. a = 1; a = 2; i = a; j = a; 3. Lock coarsening. For example, given two succes- sive calls to synchronized methods on the same CRF does not allow i == 2 and j == 1 monitor, it is legal simply to perform one Lock, before the first method call, and perform one Un- Figure 19: Global memory constraints in CRF lock, after the second call. This is legal because if no other thread acquired the lock between the two calls, then the Unlock/Lock actions between 5. Forward substitution across lock acquires. For the two calls have no effect. Note: there are example, if a variable x is written, a lock is ac- liveness issues associated with lock coarsening, quired, and x is then read, then it is possible to which need to be addressed separately. The Java use the value written to x as the value read from specification should probably require that if a x. This is because the lock action does not guar- lower priority thread gives up a lock and a higher antee that any values written to x by another priority thread is waiting for a lock on the same thread will be returned by a read in this thread object, the higher priority thread is given the if this thread performed an unsynchronized write lock. For equal priority threads, some fairness of x. In general, it is possible to move most op- guarantee should be made. erations to normal variables inside synchronized blocks. 4. Replacement of a thread local volatile field (i.e., one accessed by only a single thread) with a nor- mal field. Since no other thread will see the infor- mation associated with the volatile, the overwrit- 10 Related Work ten and previous information associated with the volatile will not be seen by other threads; since Maessen et al. [MS00] present an operational seman- the variable is thread local, all accesses are guar- tics for Java threads based on the CRF model. At the anteed to be correctly synchronized. user level, the proposed semantics are very similar to those proposed in this paper (due to the fact that we 19
  • 20. met together to work out the semantics). However, ple, in Figure 19 it is prohibited that i = 2 and j = 1.we believe are some troublesome (although perhaps This prohibition has nothing to do with safety guar-not fatal) issues with that paper. antees or execution of correctly synchronized pro- Perhaps most seriously, the CRF model doesn’t grams. Rather, it is just an artifact of the CRFdistinguish between final fields and non-final fields as model. An implementation of Java on an aggressivefar as seeing the writes performed in a constructor. SMP architecture that allowed this behavior wouldAs discussed in [MS00, §6.1], they rely on memory not correctly adhere to these semantics.barriers at the end of constructors to order the writesand data dependences to order the reads. Thismeans that in Figure 1b, their semantics prohibit r3 11 Conclusion== 0, even though the x field is not final. Sincethis guarantee requires additional memory barriers We have proposed both an informal and formal mem-on systems using the Alpha memory model, it is un- ory model for multithreaded Java programs. Thisdesirable to make it for non-final fields. model will both allow people to write reliable multi- Another problem is that [MS00] does not allow as threaded programs and give JVM implementors themuch elimination of “useless synchronization”. The ability to create efficient implementations.CRF-based specification provides a special rule to It is essential that a compiler writer understandallow skipping coherence actions associated with a what optimizations and transformations are allowedmonitorenter if the thread that previously released by a memory model. Ideally, in code that doesn’tthe lock is the same thread as the current thread. contain synchronization operations, all the standardHowever, no such rule applies to monitorexit. As a compiler optimizations would be legal. In fact, noresult, in Figure 17 it is illegal to see 0 in both i and proof of this could be forthcoming because there arej. Also, their model doesn’t provide any “coherence- a very few standard optimizations that are not legal.skipping” rule for volatiles, so memory barriers must In particular, in a single-threaded environment, if yoube associated with thread-local volatile fields. Also, prove there are no writes to a variable between twowhile the CRF semantics allow skipping the memory reads, you can assume that both reads return thebarrier instructions associated with monitorenter on same value, and possibly omit some bounds check-thread local monitors, it isn’t clear that it allows com- ing or null-pointer checks that would otherwise bepiler reordering past thread-local synchronization. required. In a multithreaded setting, no such causal assumptions can be made. In contrast, under our model most synchronization However, the process of understanding and docu-optimizations, such as removal of “useless synchro- menting the interactions between the memory modelnization”, fall out naturally as a consequence of using and optimizations is of vital importance and will bea lazy release consistency [CZ92] style semantics. the focus of continuing work. Furthermore, the handling of control and data de- Now that a broad community has reached roughpendences is worrisome. Speculative reads are rep- consensus on an informal semantics for multithreadedresented by moving Load instructions earlier in exe- Java, the important step now is to formalize thatcution. However, for an operational semantics, it is model. Doing so requires figuring out all of the cornerhard to imagine executing a Load instruction before cases, and providing a framework that would allowyou know the address that needs to be Loaded. In formal reasoning about the model. We believe thatfact, they specifically prohibit it [MS00, §6.1] in order this proposal both provides the guarantees needed byto get the required semantics for final fields. Java programmers and the freedoms needed by JVM For example, the code in Figure 18 shows a behav- implementors.ior prohibited by CRF. Since the read of tmp.next isdata dependent on the read of p.next, it must followthe read of p.next. Acknowledgments While it is hard to imagine a compiler transforma-tion or processor architecture in which this reordering Thanks to the many people who have partici-could occur, it none the less imposes a proof burden: pated in the discussions of this topic, particularlyshowing that any implementation does not allow this Sarita Adve, Arvind, Joshua Bloch, Joseph Bow-reordering which is not allowed by CRF. beer, David Detlefs, Sanjay Ghemawat, Paul Haahr, Similarly, because CRF models a single global David Holmes, Doug Lea, Tim Lindholm, Jan-Willemmemory through which all communication is per- Maessen, Xiaowei Shen, Raymie Stata, Guy Steeleformed, certain behaviors are prohibited. For exam- and Dennis Sosnoski. 20
  • 21. References [Pug00b] William Pugh. The Java memory model is fa- tally flawed. Concurrency: Practice and Expe-[ACS99] Jonathan Aldrich, Craig Chambers, and rience, 12(1):1–11, 2000. Emir Gun Sirer. Eliminating unnecessary syn- chronization from java programs. In OOPSLA [WR99] John Whaley and Martin Rinard. Composi- poster session, October 1999. tional pointer and escape analysis for Java pro- grams. In OOPSLA, October 1999.[BH99] Jeff Bogda and Urs Hoelzle. Removing unnec- essary synchronization in java. In OOPSLA, October 1999.[Bla99] Bruno Blanchet. Escape analysis for object A Class Initialization oriented languages; application to Java. In The JVM specification requires [LY99, §5.5] that OOPSLA, October 1999. before executing a GETSTATIC, PUTSTATIC, IN-[CGS+ 99] Jong-Deok Choi, Manish Gupta, Mauricio Ser- VOKESTATIC or a NEW instruction on a class C, rano, Vugranam Sreedhar, and Sam Midkiff. or initializing a subclass of C, class C must be ini- Escape analysis for Java. In OOPSLA, Octo- ber 1999. tialized. Furthermore, class C may not be initialized before it is required by the above rule.[CZ92] Pete Keleher Alan L. Cox and Willy Zwaenepoel. Lazy release consistency for soft- Although the JVM specification does not spell it ware distributed shared memory. In The Pro- out, it is clear that any situation that requires that ceedings of the 19 th International Symposium a thread T1 check to see that a class C has been of Computer Architecture, pages 13–21, May initialized must also require that T1 see all of the 1992. memory actions resulting from the initialization of[GJS96] James Gosling, Bill Joy, and Guy Steele. The class C. Java Language Specification. Addison Wesley, This has a number of subtle and surprising im- 1996. plications for compilation, and interactions with the[GLL+ 90] K. Gharachorloo, D. Lenoski, J. Laudon, threading model. P. Gibbons, A. Gupta, , and J. L. Hennessy. Initializing a class invokes the static initializer for Memory consistency and event ordering in the class, which can be arbitrary code. Thus, any scalable shared-memory multiprocessors. In GETSTATIC, PUTSTATIC, INVOKESTATIC or a Proceedings of the Seventeenth International NEW instruction on a class C, which might be the Symposium on Computer Architecture, pages very first invocation of an instruction on class C, 15–26, May 1990. must be treated as a potential call of the initial-[GS98] Guang Gao and Vivek Sarkar. Location consis- ization code. Thus, if A and B are classes, the ex- tency – a new memory model and cache consis- pression A.x+B.y+A.x cannot always be optimized tency protocol. Technical Report 16, CAPSL, to A.x*2+B.y; the read of B.y may have side effects Univ. of Deleware, February 1998. that change the value of A.x (because it might invoke[JMM] The Java memory model. Mailing list and web the initialization code for B that could modify A.x). page. http://www.cs.umd.edu/ ∼pugh/ java/ It would be possible to perform static analysis to memoryModel. verify that a particular instruction could not possibly[LY99] Tim Lindholm and Frank Yellin. The Java be the first time a thread was required to check that Virtual Machine Specification. Addison Wes- a class was initialized. Also, you could check that the ley, 2nd edition, 1999. results of initializing a class were not visible outside[MP01] Jeremy Manson and William Pugh. Core se- the class. Either analysis would allow the instruction mantics of multithreaded Java. In ACM Java to be reordered with other instructions. Grande Conference, June 2001. A quick reading of the spec might suggest that a[MS00] Arvind Jan-Willem Maessen and Xiaowei thread can simply check a boolean flag to see if the Shen. Improving the Java memory model us- class is initialized, and skip initialization code if the ing CRF. In OOPSLA, pages 1–12, October class is already initialized. This is almost true. How- 2000. ever, the thread checking to see that the class is ini-[Pug99] William Pugh. Fixing the Java memory model. tialized must see all updates caused by initializing In ACM Java Grande Conference, June 1999. the class. This may require flushing registers and[Pug00a] William Pugh. The double checked locking is performing a memory barrier. broken declaration. http://www.cs.umd.edu/ users/ pugh/ java/ memoryModel/ Dou- Classes can also be initialized due to use of reflection or bleCheckedLocking.html, July 2000. by being designated as the initial class of the JVM. 21
  • 22. Similarly, once a xxxSTATIC or NEW instructionhas been invoked, it is tempting to rewrite the codeto eliminate the initialization check. However, thisrewrite cannot be done until all threads have donethe barrier required to see the effects of initializingthe class. Another surprising result is that the existing specallows a thread to invoke methods and read/writeinstance fields of an instance of a class C before see-ing all of the effects of the initialization of that class.How could this happen? Consider if thread T1 ini-tializes class C, creates an instance x of class C, andthen stores a reference to the instance into a globalvariable. Thread T2 could then, without synchro-nization, read the global variable, see the referenceto x, and invoke a virtual method on x. At thispoint, although C has been initialized, T2 hasn’t donethe memory barrier or register flushes that wouldbe required to see the updates performed by initial-izing class C. This means that even within virtualmethods of class C, we can’t automatically elimi-nate/skip initialization checks associated with GET-STATIC, PUTSTATIC, INVOKESTATIC or NEWinstructions on a class C. 22

×