Intel JIT Talk

2,611 views

Published on

Published in: Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,611
On SlideShare
0
From Embeds
0
Number of Embeds
35
Actions
Shares
0
Downloads
49
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide
  • I’m going to start off by showing my favorite demo, that I think really captures what we’re enabling by making the browser as fast possible.
  • So this is Doom, translated - actually compiled - from C++ to JavaScript, being played in Firefox. In addition to JavaScript it’s using the new 2D drawing element called Canvas. This demo is completely native to the browser. There’s no Flash, it’s all open web technologies. And it’s managing to get a pretty good framerate, 30fps, despite being mostly unoptimized. This is a really cool demo for me because, three years ago, this kind of demo on the web was just impossible without proprietary extensions. Some of the technology just wasn’t there, and the rest of it was anywhere from ten to a hundred times slower. So we’ve come a long way.… Unfortunately all I have of the demo is a recording, it was playable online for about a day but iD Software requested that we take it down.
  • To step back for a minute, I’m going to briefly overview how the browser and the major web technologies interact, and then jump into JavaScript which is the heart of this talk.
  • This is a much less gory example, a major company’s website in a popular web browser. The process of getting from the intel.com URL to this display is actually pretty complicated.
  • The browser then sends intel.com this short plaintext message which says, “give me the webpage corresponding to the top of your website’s hierarchy.”
  • How does the browser get from all of this to the a fully rendered website?
  • Now that we’re familiar with the language’s main features, let’s talk about implementing a JavaScript engine. A JavaScript implementation has four main pieces
  • One of the challenges of a JavaScript engine is dealing with the lack of type declarations. In order to dispatch the correct computation, the runtime must be able to query a variable or field’s current type.
  • We’re done with data representation and garbage collection, so it’s time for the really fun stuff: running JavaScript code, and making it run fast. One way to run code, but not make it run fast, is to use an interpreter.
  • Intel JIT Talk

    1. 1. High Performance JavaScript: JITs in Firefox<br />
    2. 2. Power of the Web<br />
    3. 3. How did we get here?<br /><ul><li>Making existing web tech faster and faster
    4. 4. Adding new technologies, like video, audio, integration and 2D+3D drawing</li></li></ul><li>The Browser<br />
    5. 5. Getting a Webpage<br />www.intel.com<br />192.198.164.158<br />
    6. 6. Getting a Webpage<br />www.intel.com<br />192.198.164.158<br />GET / HTTP/1.0<br />(HTTP Request)<br />
    7. 7. Getting a Webpage<br /><html><head><title>Laptop, Notebooks, ...</title></head><script language="javascript">functiondoStuff() {...}</script><body>...<br />(HTML, CSS, JavaScript)<br />
    8. 8. Core Web Components<br /><ul><li>HTML provides content
    9. 9. CSS provides styling
    10. 10. JavaScript provides a programming interface</li></li></ul><li>New Technologies<br /><ul><li>2D Canvas
    11. 11. Video, Audio tags
    12. 12. WebGL (OpenGL + JavaScript)</li></li></ul><li>DOM<br />(Source: http://www.w3schools.com/htmldom/default.asp)<br />
    13. 13. CSS<br />Cascading Style Sheets<br />Separates presentation from content<br /><b class=”warning”>This is red</b><br />.warning {color: red;text-decoration: underline;}<br />
    14. 14. What is JavaScript?<br /><ul><li>C-like syntax
    15. 15. De facto language of the web
    16. 16. Interacts with the DOM and the browser</li></li></ul><li>C-Like Syntax<br /><ul><li>Braces, semicolons, familiar keywords</li></ul>if (x) {for (i=0; i<100; i++)print("Hello!");}<br />
    17. 17. Displaying a Webpage<br />Layout engine applies CSS to DOM<br />Computes geometry of elements<br />JavaScript engine executes code<br />This can change DOM, triggers “reflow”<br />Finished layout is sent to graphics layer<br />
    18. 18. The Result<br />
    19. 19. JavaScript<br /><ul><li>Invented by Brendan Eich in 1995 for Netscape 2
    20. 20. Initial version written in 10 days
    21. 21. Anything from small browser interactions, complex apps (Gmail) to intense graphics (WebGL)</li></li></ul><li>The Result<br />
    22. 22. Think LISP or Self, not Java<br />Untyped - no type declarations<br />Multi-Paradigm – objects, closures, first-class functions<br />Highly dynamic - objects are dictionaries<br />
    23. 23. No Type Declarations<br /><ul><li>Properties, variables, return values can be anything:</li></ul>functionf(a) {var x =72.3;if (a) x = a +"string";return x;}<br />
    24. 24. No Type Declarations<br /><ul><li>Properties, variables, return values can be anything:</li></ul>if a is:<br /> number: add;<br /> string: concat;<br />functionf(a) {var x =“hi”;if (a) x = a +33.2;return x;}<br />
    25. 25. Functional<br /><ul><li>Functions may be returned, passed as arguments:</li></ul>functionf(a) {returnfunction () {return a; }}var m = f(5);print(m());<br />
    26. 26. Objects<br /><ul><li>Objects are dictionaries mapping strings to values
    27. 27. Properties may be deleted or added at any time!</li></ul>var point = { x : 5, y : 10 };deletepoint.x;point.z=12;<br />
    28. 28. Prototypes<br />Every object can have a prototype object<br />If a property is not found on an object, its prototype is searched instead<br />… And the prototype’s prototype, etc..<br />
    29. 29. Numbers<br /><ul><li>JavaScript specifies that numbers are IEEE-754 64-bit floating point
    30. 30. Engines use 32-bit integers to optimize
    31. 31. Must preserve semantics: integers overflow to doubles</li></li></ul><li>Numbers<br />varx = 0x7FFFFFFF;<br />x++;<br />intx = 0x7FFFFFFF;<br />x++;<br />
    32. 32. JavaScript Implementations<br /><ul><li>Values (Objects, strings, numbers)
    33. 33. Garbage collector
    34. 34. Runtime library
    35. 35. Execution engine (VM)</li></li></ul><li>Values<br />The runtime must be able to query a value’s type to perform the right computation.<br />When storing values to variables or object fields, the type must be stored as well.<br />
    36. 36. Values<br /><ul><li>Boxed values required for variables, object fields
    37. 37. Unboxed values required for computation</li></li></ul><li>Boxed Values<br /><ul><li>Need a representation in C++
    38. 38. Easy idea: 96-bit struct
    39. 39. 32-bits for type
    40. 40. 64-bits for double, pointer, or integer
    41. 41. Too big! We have to pack better.</li></li></ul><li>Boxed Values<br />We could use pointers, and tag the low bits<br />Doubles would have to be allocated on the heap<br />Indirection, GC pressure is bad<br />There is a middle ground…<br />
    42. 42. <ul><li>Values are 64-bit
    43. 43. Doubles are normal IEEE-754
    44. 44. How to pack non-doubles?
    45. 45. 51 bits of NaN space!</li></ul>IA32, ARM<br />Nunboxing<br />
    46. 46. Nunboxing<br />IA32, ARM<br />Type<br />Payload<br />0x400c0000<br />0x00000000<br />63<br />0<br /><ul><li>Full value: 0x400c000000000000
    47. 47. Type is double because it’s not a NaN
    48. 48. Encodes: (Double, 3.5)</li></li></ul><li>Nunboxing<br />IA32, ARM<br />Type<br />Payload<br />0x00000040<br />0xFFFF0001<br />63<br />0<br /><ul><li>Full value: 0xFFFF000100000040
    49. 49. Value is in NaN space
    50. 50. Type is 0xFFFF0001(Int32)
    51. 51. Value is (Int32, 0x00000040)</li></li></ul><li>Punboxing<br />x86-64<br />Type<br />Payload<br />63<br />0<br />47<br />1111 1111 1111 1000 0<br />000 .. 0000 0000 0000 0000 0100 0000<br /><ul><li>Full value: 0xFFFF800000000040
    52. 52. Value is in NaN space
    53. 53. Bits >> 47 == 0x1FFFF == INT32
    54. 54. Bits 47-63 masked off to retrieve payload
    55. 55. Value(Int32, 0x00000040)</li></li></ul><li>* Some 64-bit OSes can restrict mmap() to 32-bits<br />
    56. 56. Garbage Collection<br />Need to reclaim memory without pausing user workflow, animations, etc<br />Need very fast object allocation<br />Consider lots of small objects like points, vectors<br />Reading and writing to the heap must be fast<br />
    57. 57. Mark and Sweep<br />All live objects are found via a recursive traversal of the root value set<br />All dead objects are added to a free list<br />Very slow to traverse entire heap<br />Building free lists can bring “dead” memory into cache<br />
    58. 58. Improvements<br /><ul><li>Sweeping, freelist building have been moved off-thread
    59. 59. Marking can be incremental, interleaved with program execution</li></li></ul><li>Generational GC<br />Observation: most objects are short-lived<br />All new objects are bump-allocated into a newborn space<br />Once newborn space is full, live objects are moved to the tenured space<br />Newborn space is then reset to empty<br />
    60. 60. Generational GC<br />Inactive<br />Active<br />Newborn Space<br />8MB<br />8MB<br />Tenured Space<br />
    61. 61. After Minor GC<br />Active<br />Inactive<br />Newborn Space<br />8MB<br />8MB<br />Tenured Space<br />
    62. 62. Barriers<br />Incremental marking, generational GC need read or write barriers<br />Read barriers are much slower<br />Azul Systems has hardware read barrier<br />Write barrier seems to be well-predicted?<br />
    63. 63. Unknowns<br />How much do cache effects matter?<br />Memory GC has to touch<br />Locality of objects<br />What do we need to consider to make GC fast?<br />
    64. 64. Running JavaScript<br /><ul><li>Interpreter – Runs code, not fast
    65. 65. Basic JIT – Simple, untyped compiler
    66. 66. Advanced JIT – Typed compiler, lightspeed</li></li></ul><li>Interpreter<br /><ul><li>Good for code that runs once
    67. 67. Giant switch loop
    68. 68. Handles all edge cases of JS semantics</li></li></ul><li>Interpreter<br />while (true) {switch (*pc) {case OP_ADD: ...case OP_SUB: ...case OP_RETURN: ... } pc++;}<br />
    69. 69. Interpreter<br />case OP_ADD: { Value lhs = POP(); Value rhs = POP(); Value result;if (lhs.isInt32() && rhs.isInt32()) {int left = rhs.toInt32();int right = rhs.toInt32();if (AddOverflows(left, right, left + right)) result.setInt32(left + right);elseresult.setNumber(double(left) + double(right)); } elseif (lhs.isString() || rhs.isString()) { String *left = ValueToString(lhs); String *right = ValueToString(rhs); String *r = Concatenate(left, right);result.setString(r); } else {double left = ValueToNumber(lhs);double right = ValueToNumber(rhs);result.setDouble(left + right); }PUSH(result);break;<br />
    70. 70. Interpreter<br />Very slow!<br />Lots of opcode and type dispatch<br />Lots of interpreter stack traffic<br />Just-in-time compilation solves both<br />
    71. 71. Just In Time<br />Compilation must be very fast<br />Can’t introduce noticeable pauses<br />People care about memory use<br />Can’t JIT everything<br />Can’t create bloated code<br />May have to discard code at any time<br />
    72. 72. Basic JIT<br />JägerMonkey in Firefox 4<br />Every opcode has a hand-coded template of assembly (registers left blank)<br />Method at a time:<br />Single pass through bytecode stream!<br />Compiler uses assembly templates corresponding to each opcode<br />
    73. 73. Bytecode<br />GETARG 0 ; fetch x<br />GETARG 1 ; fetch y<br />ADD<br />RETURN<br />functionAdd(x, y) {return x + y;}<br />
    74. 74. Interpreter<br />case OP_ADD: { Value lhs = POP(); Value rhs = POP(); Value result;if (lhs.isInt32() && rhs.isInt32()) {int left = rhs.toInt32();int right = rhs.toInt32();if (AddOverflows(left, right, left + right)) result.setInt32(left + right);elseresult.setNumber(double(left) + double(right)); } elseif (lhs.isString() || rhs.isString()) { String *left = ValueToString(lhs); String *right = ValueToString(rhs); String *r = Concatenate(left, right);result.setString(r); } else {double left = ValueToNumber(lhs);double right = ValueToNumber(rhs);result.setDouble(left + right); }PUSH(result);break;<br />
    75. 75. Assembling ADD<br /><ul><li>Inlining that huge chunk for every ADD would be very slow
    76. 76. Observation:
    77. 77. Some input types much more common than others</li></li></ul><li>Integer Math – Common<br />var j =0;for (i=0; i<10000; i++) { j +=i;}<br />Weird Stuff – Rare!<br />var j =12.3;for (i =0; i <10000; i++) { j +=new Object() + i.toString();}<br />
    78. 78. Assembling ADD<br /><ul><li>Only generate code for the easiest and most common cases
    79. 79. Large design space
    80. 80. Can consider integers common, or
    81. 81. Integers and doubles, or
    82. 82. Anything!</li></li></ul><li>Basic JIT - ADD<br />if (arg0.type != INT32) gotoslow_add;if (arg1.type != INT32)gotoslow_add;<br />
    83. 83. Basic JIT - ADD<br />if (arg0.type != INT32) gotoslow_add;if (arg1.type != INT32)gotoslow_add;<br />R0 = arg0.data<br />R1 = arg1.data<br />Greedy Register Allocator<br />
    84. 84. Basic JIT - ADD<br />if (arg0.type != INT32) gotoslow_add;if (arg1.type != INT32)gotoslow_add;<br />R0 = arg0.data;R1 = arg1.data;<br />R2 = R0 + R1;<br />if (OVERFLOWED)gotoslow_add;<br />
    85. 85. Slow Paths<br />slow_add: Value result = runtime::Add(arg0, arg1);<br />
    86. 86. Inline<br />Out-of-line<br />if (arg0.type != INT32) gotoslow_add;if (arg1.type != INT32)gotoslow_add;R0 = arg0.data;R1 = arg1.data;R2 = R0 + R1; if (OVERFLOWED)gotoslow_add;rejoin:<br />slow_add: Value result = Interpreter::Add(arg0, arg1);<br />R2 = result.data;<br />goto rejoin;<br />
    87. 87. Rejoining<br />Inline<br />Out-of-line<br /> if (arg0.type != INT32) gotoslow_add; if (arg1.type != INT32)gotoslow_add; R0 = arg0.data; R1 = arg1.data; R2 = R0 + R1; if (OVERFLOWED)gotoslow_add;<br />R3 = TYPE_INT32;rejoin:<br />slow_add: Value result = runtime::Add(arg0, arg1);<br /> R2 = result.data;<br /> R3 = result.type;<br />goto rejoin;<br />
    88. 88. Final Code<br />Inline<br />Out-of-line<br />if (arg0.type != INT32) goto slow_add; if (arg1.type != INT32)goto slow_add;R0 = arg0.data;R1 = arg1.data;R2 = R0 + R1; if (OVERFLOWED)goto slow_add;<br />R3 = TYPE_INT32;rejoin:<br />return Value(R3, R2);<br />slow_add: Value result = Interpreter::Add(arg0, arg1);<br />R2 = result.data;<br />R3 = result.type;<br />goto rejoin;<br />
    89. 89. Observations<br />Even though we have fast paths, no type information can flow in between opcodes<br />Cannot assume result of ADD is integer<br />Means more branches<br />Types increase register pressure<br />
    90. 90. Basic JIT – Object Access<br /><ul><li>x.yis multiple dictionary searches!
    91. 91. How can we create a fast-path?
    92. 92. We could try to cache the lookup, but
    93. 93. We want it to work for >1 objects</li></ul>functionf(x) {returnx.y;}<br />
    94. 94. Object Access<br />Observation<br />Usually, objects flowing through an access site look the same<br />If we knew an object’s layout, we could bypass the dictionary lookup<br />
    95. 95. Object Layout<br />var obj = { x: 50,<br /> y: 100,<br /> z: 20<br /> };<br />
    96. 96. Object Layout<br />
    97. 97. Object Layout<br />varobj = { x: 50,<br /> y: 100,<br /> z: 20<br /> };<br />…<br />varobj2 = { x: 78,<br /> y: 93,<br /> z: 600<br /> };<br />
    98. 98. Object Layout<br />
    99. 99. Familiar Problems<br />We don’t know an object’s shape during JIT compilation<br />... Or even that a property access receives an object!<br />Solution: leave code “blank,” lazily generate it later<br />
    100. 100. Inline Caches<br />functionf(x) {returnx.y;}<br />
    101. 101. Inline Caches<br /><ul><li>Start as normal, guarding on type and loading data</li></ul>if (arg0.type != OBJECT)gotoslow_property;<br />JSObject *obj = arg0.data;<br />
    102. 102. <ul><li>No information yet: leave blank (nops).</li></ul> if (arg0.type != OBJECT)gotoslow_property;<br />JSObject *obj = arg0.data;<br />gotoproperty_ic;<br />Inline Caches<br />
    103. 103. Property ICs<br />functionf(x) {returnx.y;}<br />f({x: 5, y: 10, z: 30});<br />
    104. 104. Property ICs<br />
    105. 105. Property ICs<br />
    106. 106. Property ICs<br />
    107. 107. Inline Caches<br /><ul><li>Shape = S1, slot is 1</li></ul> if (arg0.type != OBJECT)gotoslow_property;<br />JSObject *obj = arg0.data;<br />gotoproperty_ic;<br />
    108. 108. Inline Caches<br /><ul><li>Shape = S1, slot is 1</li></ul> if (arg0.type != OBJECT)gotoslow_property;<br />JSObject *obj = arg0.data;<br />gotoproperty_ic;<br />
    109. 109. Inline Caches<br /><ul><li>Shape = SHAPE1, slot is 1</li></ul> if (arg0.type != OBJECT)gotoslow_property;<br />JSObject *obj = arg0.data;<br /> if (obj->shape != SHAPE1)<br />gotoproperty_ic;<br />R0 = obj->slots[1].type;<br />R1 = obj->slots[1].data;<br />
    110. 110. Polymorphism<br /><ul><li>What happens if two different shapes pass through a property access?
    111. 111. Add another fast-path...</li></ul>functionf(x) {returnx.y;}f({ x: 5, y: 10, z: 30});f({ y: 40});<br />
    112. 112. Polymorphism<br />Main Method<br /> if (arg0.type != OBJECT)gotoslow_path;JSObject *obj = arg0.data;if (obj->shape != SHAPE1)gotoproperty_ic;R0 = obj->slots[1].type;R1 = obj->slots[1].data;rejoin:<br />
    113. 113. Polymorphism<br />Main Method<br />Generated Stub<br /> if (arg0.type != OBJECT)gotoslow_path;JSObject *obj = arg0.data; if (obj->shape != SHAPE0)gotoproperty_ic; R0 = obj->slots[1].type; R1 = obj->slots[1].data;rejoin:<br /> stub:<br />if (obj->shape != SHAPE2)gotoproperty_ic;<br />R0 = obj->slots[0].type;R1 = obj->slots[0].data;<br />goto rejoin;<br />
    114. 114. Polymorphism<br />Main Method<br />Generated Stub<br />if (arg0.type != OBJECT) goto slow_path; JSObject *obj = arg0.data;if (obj->shape != SHAPE1)goto stub; R0 = obj->slots[1].type; R1 = obj->slots[1].data;rejoin:<br /> stub:<br />if (obj->shape != SHAPE2)goto property_ic;<br />R0 = obj->slots[0].type;R1 = obj->slots[0].data;<br />goto rejoin;<br />
    115. 115. Chain of Stubs<br />if(obj->shape == S1) {<br /> result = obj->slots[0];<br />} else {<br />if (obj->shape == S2) {<br /> result = obj->slots[1];<br /> } else {<br />if (obj->shape == S3) {<br /> result = obj->slots[2];<br /> } else {<br />if (obj->shape == S4) {<br /> result = obj->slots[3];<br /> } else {<br />goto property_ic;<br /> }<br /> }<br /> }<br />}<br />
    116. 116. Code Memory<br /><ul><li>Generated code is patched from inside the method
    117. 117. Self-modifying, but single threaded
    118. 118. Code memory is always rwx
    119. 119. Concerned about protection-flipping expense</li></li></ul><li>Basic JIT Summary<br /><ul><li>Generates simple code to handle common cases
    120. 120. Inline caches adapt code based on runtime observations
    121. 121. Lots of guards, poor type information</li></li></ul><li>Optimizing Harder<br /><ul><li>Single pass too limited: we want to perform whole-method optimizations
    122. 122. We could generate an IR, but without type information...
    123. 123. Slow paths prevent most optimizations.</li></li></ul><li>Code Motion?<br />functionAdd(x, n) {<br />varsum = 0;for (vari = 0; i < n; i++)<br /> sum += x + n;<br /> return sum;}<br />The addition looks loop invariant.<br />Can we hoist it?<br />
    124. 124. Code Motion?<br />functionAdd(x, n) {<br />varsum = 0;<br />vartemp0 = x + n;for (vari = 0; i < n; i++)<br /> sum += temp0;<br /> return sum;}<br />Let’s try it.<br />
    125. 125. Wrong Result<br />var global =0;varobj= {valueOf: function () {return++global; }}Add(obj, 10);<br /><ul><li>Original code returns 155
    126. 126. Hoisted version returns 110!</li></li></ul><li>Idempotency<br />Untyped operations are usually not provably idempotent<br />Slow paths are re-entrant, can have observable side effects<br />This prevents code motion, redundancy elimination<br />
    127. 127. Ideal Scenario<br /><ul><li>Actual knowledge about types!
    128. 128. Remove slow paths
    129. 129. Perform whole-method optimizations</li></li></ul><li>Advanced JITs<br /><ul><li>Make optimistic guesses about types
    130. 130. Guesses must be informed
    131. 131. Don’t want to waste time compiling code that can’t or won’t run
    132. 132. Using type inference or runtime profiling
    133. 133. Generate an IR, perform textbook compiler optimizations</li></li></ul><li>Optimism Pays Off<br />People naturally write code as if it were typed<br />Variables and object fields that change types are rare<br />
    134. 134. IonMonkey<br /><ul><li>Work in progress
    135. 135. Constructs high and low-level IRs
    136. 136. Applies type information using runtime feedback
    137. 137. Inlining, GVN, LICM, LSRA</li></li></ul><li>GETARG 0 ; fetch x<br />GETARG 1 ; fetch y<br />ADD<br />GETARG 0 ; fetch x<br />ADD<br />RETURN<br />functionAdd(x, y) {return x + y + x;}<br />
    138. 138. Build SSA<br />GETARG 0 ; fetch x<br />GETARG 1 ; fetch y<br />ADD<br />GETARG 0 ; fetch x<br />ADD<br />RETURN<br />v0 = arg0<br />v1 = arg1<br />
    139. 139. Build SSA<br />GETARG 0 ; fetch x<br />GETARG 1 ; fetch y<br />ADD<br />GETARG 0 ; fetch x<br />ADD<br />RETURN<br />v0 = arg0<br />v1 = arg1<br />
    140. 140. Type Oracle<br />Need a mechanism to inform compiler about likely types<br />Type Oracle: given program counter, returns types for inputs and outputs<br />May use any data source – runtime profiling, type inference, etc <br />
    141. 141. Type Oracle<br />For this example, assume the type oracle returns “integer” for the output and both inputs to all ADD operations<br />
    142. 142. Build SSA<br />GETARG 0 ; fetch x<br />GETARG 1 ; fetch y<br />ADD<br />GETARG 0 ; fetch x<br />ADD<br />RETURN<br />v0 = arg0<br />v1 = arg1<br />v2 = iadd(v0, v1)<br />
    143. 143. Build SSA<br />GETARG 0 ; fetch x<br />GETARG 1 ; fetch y<br />ADD<br />GETARG 0 ; fetch x<br />ADD<br />RETURN<br />v0 = arg0<br />v1 = arg1<br />v2 = iadd(v0, v1)<br />v3 = iadd(v2, v0)<br />
    144. 144. Build SSA<br />GETARG 0 ; fetch x<br />GETARG 1 ; fetch y<br />ADD<br />GETARG 0 ; fetch x<br />ADD<br />RETURN<br />v0 = arg0<br />v1 = arg1<br />v2 = iadd(v0, v1)<br />v3 = iadd(v2, v0)<br />v4 = return(v3)<br />
    145. 145. SSA (Intermediate)<br />v0 = arg0<br />v1 = arg1<br />v2 = iadd(v0, v1)<br />v3 = iadd(v2, v0)<br />v4 = return(v3)<br />Untyped<br />Typed<br />
    146. 146. Intermediate SSA<br /><ul><li>SSA does not type check
    147. 147. Type analysis:
    148. 148. Makes sure SSA inputs have correct type
    149. 149. Inserts conversions, value decoding</li></li></ul><li>Unoptimized SSA<br />v0 = arg0<br />v1 = arg1<br />v2 = unbox(v0, INT32)<br />v3 = unbox(v1, INT32)<br />v4 = iadd(v2, v3)<br />v5 = unbox(v0, INT32)<br />v6 = iadd(v4, v5)<br />v7 = return(v6)<br />
    150. 150. Optimization<br />v0 = arg0<br />v1 = arg1<br />v2 = unbox(v0, INT32)<br />v3 = unbox(v1, INT32)<br />v4 = iadd(v2, v3)<br />v5 = unbox(v0, INT32)<br />v6 = iadd(v4, v5)<br />v7 = return(v6)<br />
    151. 151. Optimization<br />v0 = arg0<br />v1 = arg1<br />v2 = unbox(v0, INT32)<br />v3 = unbox(v1, INT32)<br />v4 = iadd(v2, v3)<br />v5 = unbox(v0, INT32)<br />v6 = iadd(v4, v5)<br />v7 = return(v6)<br />
    152. 152. Optimized SSA<br />v0 = arg0<br />v1 = arg1<br />v2 = unbox(v0, INT32)<br />v3 = unbox(v1, INT32)<br />v4 = iadd(v2, v3)<br />v5 = iadd(v4, v2)<br />v6 = return(v5)<br />
    153. 153. Register Allocation<br />Linear Scan Register Allocation<br />Based on Christian Wimmer’s work<br />Linear Scan Register Allocation for the Java HotSpot™ Client Compiler. Master's thesis, Institute for System Software, Johannes Kepler University Linz, 2004<br />Linear Scan Register Allocation on SSA Form. In Proceedings of the International Symposium on Code Generation and Optimization, pages 170–179. ACM Press, 2010.<br />Same algorithm used in Hotspot JVM<br />
    154. 154. Allocate Registers<br />0: stack arg0<br />1: stack arg1<br />2: r1 unbox(0, INT32)<br />3: r0 unbox(1, INT32)<br />4: r0 iadd(2, 3)<br />5: r0 iadd(4, 2)<br />6: r0 return(5)<br />
    155. 155. Code Generation<br />SSA<br />0: stack arg0<br />1: stack arg1<br />2: r1 unbox(0, INT32)<br />3: r0 unbox(1, INT32)<br />4: r0 iadd(2, 3)<br />5: r0 iadd(4, 2)<br />6: r0 return(5)<br />
    156. 156. Code Generation<br />SSA<br />Native Code<br />0: stack arg0<br />1: stack arg1<br />2: r1unbox(0, INT32)<br />3: r0 unbox(1, INT32)<br />4: r0 iadd(2, 3)<br />5: r0 iadd(4, 2)<br />6: r0 return(5)<br />if (arg0.type != INT32)goto BAILOUT; r1 = arg0.data;<br />
    157. 157. Code Generation<br />SSA<br />Native Code<br />0: stack arg0<br />1: stack arg1<br />2: r1 unbox(0, INT32)<br />3: r0unbox(1, INT32)<br />4: r0 iadd(2, 3)<br />5: r0 iadd(4, 2)<br />6: r0 return(5)<br /> if (arg0.type != INT32) goto BAILOUT; r1 = arg0.data;<br />if (arg1.type != INT32)goto BAILOUT; r0 = arg1.data;<br />
    158. 158. Code Generation<br />SSA<br />Native Code<br />0: stack arg0<br />1: stack arg1<br />2: r0 unbox(0, INT32)<br />3: r1 unbox(1, INT32)<br />4: r0iadd(2, 3)<br />5: r0 iadd(4, 2)<br />6: r0 return(5)<br /> if (arg0.type != INT32) goto BAILOUT; r0 = arg0.data;<br /> if (arg1.type != INT32) goto BAILOUT; r1 = arg1.data;<br /> r0 = r0 + r1;<br />if (OVERFLOWED)<br />goto BAILOUT;<br />
    159. 159. Code Generation<br />SSA<br />Native Code<br />0: stack arg0<br />1: stack arg1<br />2: r0 unbox(0, INT32)<br />3: r1 unbox(1, INT32)<br />4: r0 iadd(2, 3)<br />5: r0iadd(4, 2)<br />6: r0 return(5)<br /> if (arg0.type != INT32) goto BAILOUT; r0 = arg0.data;<br /> if (arg1.type != INT32) goto BAILOUT; r1 = arg1.data;<br /> r0 = r0 + r1;<br /> if (OVERFLOWED)<br /> goto BAILOUT;<br /> r0 = r0 + r1;<br />if (OVERFLOWED)<br />goto BAILOUT;<br />
    160. 160. Code Generation<br />SSA<br />Native Code<br />0: stack arg0<br />1: stack arg1<br />2: r0 unbox(0, INT32)<br />3: r1 unbox(1, INT32)<br />4: r0 iadd(2, 3)<br />5: r0 iadd(4, 2)<br />6: r0 return(4)<br /> if (arg0.type != INT32) goto BAILOUT; r0 = arg0.data;<br /> if (arg1.type != INT32) goto BAILOUT; r1 = arg1.data;<br /> r0 = r0 + r1;<br /> if (OVERFLOWED)<br /> goto BAILOUT;<br /> r0 = r0 + r1;<br /> if (OVERFLOWED)<br /> goto BAILOUT;<br /> return r0;<br />
    161. 161. Generated Code<br /> if (arg0.type != INT32)goto BAILOUT; r0 = arg0.data; if (arg1.type != INT32)goto BAILOUT; r1 = arg1.data; r0 = r0 + r1; if (OVERFLOWED)goto BAILOUT;<br /> r0 = r0 + r1; if (OVERFLOWED)goto BAILOUT;return r0;<br />
    162. 162. Generated Code<br /><ul><li>Bailout exits JIT
    163. 163. May recompile function</li></ul> if (arg0.type != INT32)goto BAILOUT; r0 = arg0.data; if (arg1.type != INT32)goto BAILOUT; r1 = arg1.data; r0 = r0 + r1; if (OVERFLOWED)goto BAILOUT;<br /> r0 = r0 + r1; if (OVERFLOWED)goto BAILOUT;return r0;<br />
    164. 164. Guards Still Needed<br />Object shapes for reading properties<br />Types of…<br />Arguments<br />Values read from the heap<br />Values returned from C++<br />
    165. 165. Guards Still Needed<br />Math<br />Integer Overflow<br />Divide by Zero<br />Multiply by -0<br />NaN is falseyin JS, truthy in ISAs<br />
    166. 166. Advanced JITs<br /><ul><li>Full type speculation - no slow paths
    167. 167. Can move and eliminate code
    168. 168. If speculation fails, method is recompiled or deoptimized
    169. 169. Can still use techniques like inline caching!</li></li></ul><li>To Infinity, and Beyond<br /><ul><li>Advanced JIT example has four guards:
    170. 170. Two type checks
    171. 171. Two overflow checks
    172. 172. Better than Basic JIT, but we can do better
    173. 173. Interval analysis can remove overflow checks
    174. 174. Type Inference!</li></li></ul><li>Type Inference<br /><ul><li>Whole program analysis to determine types of variables
    175. 175. Hybrid: both static and dynamic
    176. 176. Replaces checks in JIT with checks in virtual machine
    177. 177. If VM breaks an assumption held in any JIT code, that JIT code is discarded</li></li></ul><li>Conclusions<br />The web needs JavaScript to be fast<br />Untyped, highly dynamic nature makes this challenging<br />Value boxing has different tradeoffs on different ISAs<br />GC needs to be fast – what should we focus on?<br />
    178. 178. Conclusions<br />JITs getting more powerful<br />Type specialization is key<br />Still need checks and special cases on many operations<br />
    179. 179. Questions?<br />

    ×