Android JIT


Introduction



Just-In-Time (JIT)/Dynamic Compilation



JIT Design



Dalvik JIT



JIT Compiler



Intermediate...
Introduction:

The Java language is made to be interpreted to achieve the critical goal of
application portability.
HW.jav...
Problem (Contd.):
The conventional approach resulted in significantly lower performance when
compared to compiled language...
Just-In-Time (JIT)/Dynamic Compilation :
The Just-In-Time (JIT) compiler is a component of the Java Runtime
Environment. I...
Just-In-Time (JIT)/Dynamic Compilation (Contd.) :
JIT Compilation Strategies:






With a JIT compiler, Java program...
Just-In-Time (JIT)/Dynamic Compilation (Contd.) :
The Just-In-Time (JIT) compiler is a component of the Java Runtime Envir...
Just-In-Time (JIT)/Dynamic Compilation (Contd.) :
When the JIT is loaded, bytecode address in the V-table is replaced with...
Just-In-Time (JIT)/Dynamic Compilation (Contd.) :
The JIT compiler steps in and compiles the Java bytecode into native cod...
JIT Design :
Challenges (Price of Platform neutrality):



The time it takes to compile the code is added to the program...
JIT Design (Contd.) :
There are 4 reasons for why a JIT for the complete byte code set was not implemented and
the combine...
JIT Design (Contd.) :
4. A debugging technique (discussed below) was used which could not have been
employed so easily wit...
JIT Design (Contd.) :
Compilation may not be possible for one of the following reasons.
1.A native function was called.
2....
JIT Design (Contd.) :

1. The JVM interprets a method until its call count exceeds a JIT threshold.
2.After a method is co...
JIT Design (Contd.) :
Interpreter
JIT=OFF

JIT

.class

JIT=ON Threshold=10
.class

.class

times >= 10
JVM

JVM

Native
C...
Dalvik JIT :
Dalvik Execution Environment:
1.Register based architecture (Register Machine)
Stack-based machines (JVMs) mu...
Dalvik JIT (Contd.):
Other part of solution is Dalvik JIT:
Translates byte code to optimized native code at run time.
1.Me...
Dalvik JIT (Contd.):

2.Trace Compiler
- Most common model for low-level code migration systems
- Interprets with profilin...
Dalvik JIT (Contd.):
(Method Vs Trace):

Method JIT:
Best optimization window
Trace JIT:
Best speed/space tradeoff

Full P...
Dalvik JIT (Contd.):

The provisional decision was to start with trace for the following reasons:
• Minimizing memory usag...
Dalvik JIT (Contd.):
Dalvik Trace JIT Flow:

Start

Interpret until
next potential
trace head

Translation Cache

NO

Upda...
Dalvik JIT (Contd.):

Features:
• Trace request is built during interpretation
- Allows access to actual run-time values
-...
JIT Compiler:
JIT Compiler Work Flow:
In order to execute bytecode, JIT compiler goes through three stages.
1.Baseline: Ge...
Intermediate Representation:


An IR instruction is an N-tuple (a simple mathematical set), consisting of an
operator, an...
Intermediate Representation (contd.):
Three Address Code (TAC or 3AC):
1.Three-address code is a form of representing inte...
Intermediate Representation (contd.):
Static Single Assignment form (SSA):
1.A refinement of three-address code and a prop...
Intermediate Representation (contd.):
3 levels of IR:
Levels of IR:

b
y
t
e
c
o
d
e

H

M

L

I

I

I

R

R

M
a
c
h
i
n
...
Intermediate Representation (contd.):
1.HIR (High Level IR)
a) IR that are closer to high-level language (Operators simila...
Intermediate Representation (contd.):
Conversion from Java bytecode to HIR:
Compiler that performs this conversion contain...
Intermediate Representation (contd.):
y=x+5

Example of on-fly-optimization:

Java Bytecode

iload x
iconst 5
iadd
istore ...
Intermediate Representation (contd.):
The HIR generated code for AdditionMethodTest.java:
********* START OF IR DUMP Initi...
Intermediate Representation (contd.):
Optimizations for HIR:
Following optimizers are provided for the basic optimization....
Intermediate Representation (contd.):
2.Medium-Level IRs (MIR)
a) Support range of features in a set of source languages, ...
Optimization Techniques:
Why Optimization:
1.

Programmers do not always write optimal code.
a) For example, ways to impro...
Optimization Techniques:
1. In-lining (also at lower levels)
2. Specialization
3. Constant folding
4. Constant propagation...
Upcoming SlideShare
Loading in...5
×

Dalvik jit

1,118

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,118
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
89
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Dalvik jit

  1. 1. Android JIT
  2. 2.  Introduction  Just-In-Time (JIT)/Dynamic Compilation  JIT Design  Dalvik JIT  JIT Compiler  Intermediate Representation  Optimization Techniques  Data- Control- Flow Analysis
  3. 3. Introduction: The Java language is made to be interpreted to achieve the critical goal of application portability. HW.java Other classes HW.class public class HW{ . . . . void hello(){ . . . . } } javac Java Source file Java Language ca 08 fe 1a ba 42 be .. java Class file(bytecode) Java Virtual Machine Microprocessors have instruction sets that define the operations they can perform, so does the VM instructions compile into a format known as bytecodes. It is through the VM that executable bytecode Java classes are executed and ultimately routed to appropriate native system calls. Problem: “A Java program executing within the VM is executed a bytecode at a time”
  4. 4. Problem (Contd.): The conventional approach resulted in significantly lower performance when compared to compiled languages like C/C++ by the additional processor and memory usage during interpretation. As a result, slow and space-constrained computing devices have tended not to include virtual computing technology(i.e. JVM). Initiatives:  JSR-30 : J2ME CLDC (Connected Limited Device Configuration) Specification  Reference implementation of the J2ME CLDC (Connected Limited Device Configuration) in April 1999, got approval in August 1999  Final public release of CLDC 1.0 in May 2003 The HotSpot engine was developed to address the perception that Java virtual machine performance was insufficient for many mainstream applications. By implementing a host of performance enhancing techniques that went beyond innovations like just-in-time (JIT) compilers, the performance of the Java virtual machine increased by an order of magnitude
  5. 5. Just-In-Time (JIT)/Dynamic Compilation : The Just-In-Time (JIT) compiler is a component of the Java Runtime Environment. It improves the performance of Java applications by compiling bytecodes to native machine code at run time. Just-In-Time Compiler Byte Codes JVM Intermediate Representation Generator Optimizer Profiler GC Code Generator Runtime Just-In-Time (JIT) Compiler
  6. 6. Just-In-Time (JIT)/Dynamic Compilation (Contd.) : JIT Compilation Strategies:      With a JIT compiler, Java programs are compiled one block of code at a time as they execute into the native processor's instructions to achieve higher performance. The process involves generating an internal representation of a method that's different from bytecodes but at a higher level than the target processor's native instructions. The compiler performs optimization to improve quality and efficiency and finally a code-generation step to translate the optimized internal representation to the target processor's native instructions To avoid the overhead of compiling and optimizing all an application’s classes at a time, a number of incremental compilation strategies have evolved. The general strategy of only compiling the “hot” parts of an application will often result in only a small percentage of an application being compiled, thus saving considerable compilation time. “A continuously operating sampling profiler identifies programs hot regions for code reoptimization” “The JIT compiler operates on a compilation thread that's separate from the application threads so that the application doesn't need to wait for a compilation to occur”
  7. 7. Just-In-Time (JIT)/Dynamic Compilation (Contd.) : The Just-In-Time (JIT) compiler is a component of the Java Runtime Environment. It improves the performance of Java applications by compiling bytecodes to native machine code at run time. A Java class that has been loaded into memory by the VM contains a V-table (virtual table), which is a list of the addresses for all the methods in the class. Method 1 Bytecode Method - 1 Method 2 Bytecode Method - 2 Method - 3 Method 3 Bytecode Method - 4 V-table Method 4 Bytecode Each address in the V-table points to the executable bytecode for the particular method
  8. 8. Just-In-Time (JIT)/Dynamic Compilation (Contd.) : When the JIT is loaded, bytecode address in the V-table is replaced with the address of the JIT compiler itself. Method - 1 Method - 2 Method - 3 Just-In-Time Compiler Method - 4 Method - 5 V-table When the VM calls a method through the address in the V-table, the JIT compiler is executed instead.
  9. 9. Just-In-Time (JIT)/Dynamic Compilation (Contd.) : The JIT compiler steps in and compiles the Java bytecode into native code and then patches the native code address back to the V-table. V-table Method - 1 Method - 2 Method - 3 Just-In-Time Compiler Method - 4 Method - 5 Method 5 Native Code From now on, each call to the method results in a call to the native version.
  10. 10. JIT Design : Challenges (Price of Platform neutrality):   The time it takes to compile the code is added to the program's running time. JIT typically causes a slight delay in initial execution of an application, due to the time taken to load and compile the bytecode. Optimizations:  Modern JIT compilers take one of two approaches 1. Compile all the code but without performing any expensive analyses or transformations so that the code is generated quickly. 2. Devote compilation resources to only a small number of methods that execute frequently.  Combine interpretation and JIT compilation. The application code is initially interpreted, but the JVM monitors which sequences of bytecode are frequently executed and translates them to machine code for direct execution on the hardware.
  11. 11. JIT Design (Contd.) : There are 4 reasons for why a JIT for the complete byte code set was not implemented and the combined usage of Interpreter and JIT has become unavoidable. 1.If thread context switching would have had to be performed whilst executing generated native code, this would have added complexity to code generation, runtime support, and the base VM code. By only performing context switching in the interpreter no changes were made to the way the thread scheduling was done in VM. 2.The generated machine code would have needed to be more rigorous in the way it dealt with error conditions and other exceptional conditions. As it is, the machine code only needs to check for error conditions. When they occur the error handling bytecodes can be then executed by the interpreter, which then can deal with the details of how the error should be processed. 3.A complete JIT would have required more complicated interactions between the generated machine code and the virtual machine as a whole. For example, the generated machine code could cause the compiler, class loader, garbage collector, or native code to run. In retrospect some of these restrictions were not strictly necessary, but the system probably has fewer undiscovered bugs, and it does not seem to have limited the performance of the type of compute-intensive software that is the target of the design. (Contd.)
  12. 12. JIT Design (Contd.) : 4. A debugging technique (discussed below) was used which could not have been employed so easily with a complete JIT. Therefore the system was designed to allow execution to pass from the compiled code to the interpreter at any time, and also for the interpreter to be able return to generated code in a timely fashion. Additionally, to keep the interpreter from getting trapped in a long loop of bytecodes it was necessary to be able to return to compiled code in the middle of a method as well as at the start. “JIT lets the interpreter to deal with complex tasks such as Class loading, Exception handling, Synchronization, Garbage Collection etc” The basic interpreter loop is as follows: Start: Try to enter compiled code. Interpret the next bytecode. goto Start. If the current method has not been compiled then checks are performed to determine if it can be.
  13. 13. JIT Design (Contd.) : Compilation may not be possible for one of the following reasons. 1.A native function was called. 2.The method has more than a certain number of parameters or local variables, is unusually large 3.There is no available memory for more compiled code. 4.An object could not be created without running the garbage collector. 5.An operation was attempted that required a class to be initialized. 6.The start of an exception handler was reached. 7.An exception or error occurred. The interpreter always processes these. 8.The part of a method was reached for which no corresponding machine code could be generated. 9.A function was called for which there was no compiled code. 10.A method return was executed but there was no compiled code to return to because the code buffer had been flushed.
  14. 14. JIT Design (Contd.) : 1. The JVM interprets a method until its call count exceeds a JIT threshold. 2.After a method is compiled, its call count is reset to zero; subsequent calls to the method continue to increment its count. 3. When the call count of a method reaches a JIT recompilation threshold, the JIT compiles it a second time, this time applying a larger selection of optimizations than on the previous compilation (because the method has proven to be a significant part of the whole program) Method - 1 Method 1 Bytecode Method - 2 Method 2 Bytecode Method - 3 Method 3 Bytecode Method - 4 Method 4 Bytecode V-table Just-In-Time Compiler
  15. 15. JIT Design (Contd.) : Interpreter JIT=OFF JIT .class JIT=ON Threshold=10 .class .class times >= 10 JVM JVM Native Code Operating System times < 10
  16. 16. Dalvik JIT : Dalvik Execution Environment: 1.Register based architecture (Register Machine) Stack-based machines (JVMs) must use instructions to load data on the stack and manipulate that data, and, thus, require more instructions than register machines. 2.Very compact representation Java bytecode is converted into an alternate instruction set used by the Dalvik VM. dx is a tool used to convert some (but not all) Java .class files into the .dex format. 3.Emphasis on code/data sharing to reduce memory usage Multiple classes are included in a single .dex file. 4.Highly-tuned very fast (2x similar) Dalvik Interpreter, good enough for most of the applications. For compute-intensive applications, Native Development Kit was released to allow Dalvik applications to call out statically-compiled(native) methods.
  17. 17. Dalvik JIT (Contd.): Other part of solution is Dalvik JIT: Translates byte code to optimized native code at run time. 1.Method Compiler 2.Trace Compiler 1.Method Compiler - Most common model for server JITs - Interprets with profiling to detect hot methods - Compile & optimize method-sized chunks - Strengths • Larger optimization window • Machine state sync with interpreter only at method call boundaries - Weaknesses • Cold code within hot methods gets compiled • Much higher memory usage during compilation & optimization • Longer delay between the point at which a method goes hot and the point that a compiled and optimized method delivers benefits
  18. 18. Dalvik JIT (Contd.): 2.Trace Compiler - Most common model for low-level code migration systems - Interprets with profiling to identify hot execution paths - Compiled fragments chained together in translation cache - Strengths • Only hottest of hot code is compiled, minimizing memory usage • Tight integration with interpreter allows focus on common cases • Very rapid return of performance boost once hotness detected - Weaknesses • Smaller optimization window limits peak gain • More frequent state synchronization with interpreter • Difficult to share translation cache across processes
  19. 19. Dalvik JIT (Contd.): (Method Vs Trace): Method JIT: Best optimization window Trace JIT: Best speed/space tradeoff Full Program 4,695,780 bytes Hot Methods 396,230 bytes 8% of program Hot Traces 396,230 bytes 26% of Hot methods 2% of program
  20. 20. Dalvik JIT (Contd.): The provisional decision was to start with trace for the following reasons: • Minimizing memory usage critical for mobile devices • Important to deliver performance boost quickly - User might give up on new app if we wait too long to JIT • Leave open the possibility of supplementing with method-based JIT - The two styles can co-exist - A mobile device looks more like a server when it’s plugged in - Best of both worlds • Trace JIT when running on battery • Method JIT in background while charging The Dalvik JIT can be considered as an extension of the Interpreter because it is the Interpreter which profiles and triggers trace selection mode when a potential trace head goes hot.
  21. 21. Dalvik JIT (Contd.): Dalvik Trace JIT Flow: Start Interpret until next potential trace head Translation Cache NO Update Profile count for this location Translation Threshold? Translation Exit 0 Exit 1 YES Interpret/build Trace request NO Xlation exists? YES Submit Compilation Request Compiler Thread Exit 0 Exit 1 Install new translation Translation Exit 0 Exit 1
  22. 22. Dalvik JIT (Contd.): Features: • Trace request is built during interpretation - Allows access to actual run-time values - Ensures that trace only includes byte codes that have successfully executed at least once (useful for some optimizations) • Trace requests handed off to compiler thread, which compiles and optimizes into native code • Compiled traces chained together in translation cache • Per-process translation caches (sharing only within security sandboxes) • Simple traces - generally 1 to 2 basic blocks long • Local optimizations - Register promotion - Load/store elimination - Redundant null-check elimination - Heuristic scheduling • Loop optimizations - Simple loop detection - Invariant code motion - Induction variable optimization
  23. 23. JIT Compiler: JIT Compiler Work Flow: In order to execute bytecode, JIT compiler goes through three stages. 1.Baseline: Generates code that is “Obviously correct” The process involves generating an internal representation of a java code that is different from bytecodes but at a higher level than the target processor's native instructions (Intermediate Representation(IR)). “IR allows more effective machine-specific optimizations” 2.Optimizing: Applies a set of optimizations to a class when it is loaded at run time 3.Adaptive: Methods are compiled with a non-optimizing compiler first and then selects “hot” methods for recompilation based on run-time profiling information. “A key part of the JIT design was to split the compilation process into two passes. The first pass transforms the standard, stack-based bytecodes into a simple 3-address intermediate representation in which all temporary statement results are placed into new local variables instead of entries on an evaluation stack. The second pass converts this three-address form into native machine code.”
  24. 24. Intermediate Representation:  An IR instruction is an N-tuple (a simple mathematical set), consisting of an operator, and some number of operands. “The Intermediate Representation is a machine- and language-independent version of the original source code”  An Operator is the instruction to perform Operands are used to represent Symbolic Register, Physical Registers, Memory Locations, Constants, Branch targets, Method Signatures, Types etc  An IR code must be convenient to translate into real assembly code for all desired target machines
  25. 25. Intermediate Representation (contd.): Three Address Code (TAC or 3AC): 1.Three-address code is a form of representing intermediate code(IR) used by compilers to aid in the implementation of code-improving transformations. 2.Each instruction in three-address code can be described as a 4-tuple: (operator, operand1, operand2, result) as shown. result := operand1 operator operand2 such as x := y + z 3.Expressions containing more than one fundamental operation, such as: p=x+y*z are not representable in three-address code as a single instruction. Instead, they are decomposed into an equivalent series of instructions, such as t1 := y * z p := x + t1 “The key features of three-address code are that every instruction implements exactly one fundamental operation, and that the source and destination may refer to any available register”
  26. 26. Intermediate Representation (contd.): Static Single Assignment form (SSA): 1.A refinement of three-address code and a property of an intermediate representation (IR), which says that each variable is assigned exactly once 2.Existing variables in the original IR are split into versions, new variables typically indicated by the original name with a subscript in textbooks, so that every definition gets its own version Benefits (by Example): y := 1 y := 2 x := y TAC 1. 2. Humans can see that the first assignment is not necessary The value of y being used in the third line comes from the second assignment of y. A program would have to perform “reaching definition analysis” to do these optimizations With SSA, 1 and 2 are immediate as it identifies “y1” is used only once and omitting it wont affect other part of code y1 := 1 y2 := 2 x := y2 SSA
  27. 27. Intermediate Representation (contd.): 3 levels of IR: Levels of IR: b y t e c o d e H M L I I I R R M a c h i n e R 1. IRs that are close to a high-level language are called high-level IRs, and IRs that are close to assembly are called low-level IRs. 2. A high-level IR might preserve things like array subscripts or field accesses whereas a low-level IR converts those into explicit addresses and offsets. Original HIR MIR float a[10][20] a[i][j+2] t1 = a[i, j+2] t1 t2 t3 t4 t5 t6 t7 = = = = = = = LIR j+2 i*20 t1+t2 4*t3 addr a t5+t4 *t6 r1 r2 r3 r4 r5 r6 r6 f1 = = = = = = = = [fp-4] [r1+2] [fp-8] r3*20 r4+r2 4*r5 fp–216 [r7+r6]
  28. 28. Intermediate Representation (contd.): 1.HIR (High Level IR) a) IR that are closer to high-level language (Operators similar to Java bytecode) b) Usually preserves information such as loop-structure and if-then-else statements c) Operate on symbolic registers instead of an implicit stack HIR Generation: class AdditionMethodTest { public static void main(String args[]) { int a = 3; int b = 4; int c = a + b; int d = getNewValue(c); return; } // End method main public static int getValue(int var) { return var * var; } // End method getNewValue } Java Code (.java) Method void main(java.lang.String[]) 0 iconst_3 1 istore_1 2 iconst_4 3 istore_2 4 iload_1 5 iload_2 6 iadd 7 istore_3 8 iload_3 9 invokestatic #2 <Method int getValue(int)> 12 istore 4 14 return Method int getNewValue(int) 0 iload_0 1 iload_0 2 imul 3 ireturn Bytecode (.class)
  29. 29. Intermediate Representation (contd.): Conversion from Java bytecode to HIR: Compiler that performs this conversion contains 2 parts. 1. The BC2IR algorithm that translates bytecode to HIR and performs on-the-fly optimizations during translation. 2.Additional optimizations perform on the HIR after translation. BC2IR Translation: 1.Discovers extended-basic-blocks 2.Constructs an exception-table for the method 3.Creates HIR instructions for bytecodes 4.Performs On-the-fly optimizations a) Copy propagation b) Constant propagation c) Register renaming for local variables d) Dead-Code elimination e) Short final or static methods are in-lined Note: Even though these optimizations are performed in later phases, doing so here reduces the size of the HIR generated and thus compile time.
  30. 30. Intermediate Representation (contd.): y=x+5 Example of on-fly-optimization: Java Bytecode iload x iconst 5 iadd istore y Generated IR (optimization off) Generated IR (optimization on) INT_ADD tint, xint 5 INT_MOVE yint, tint INT_ADD yint, xint, 5 Copy propagation algorithm can be noticed here
  31. 31. Intermediate Representation (contd.): The HIR generated code for AdditionMethodTest.java: ********* START OF IR DUMP Initial HIR FOR AdditionMethodTest.main ([Ljava/lang/String;)V -13 LABEL0 Frequency: 0.0 -2 EG ir_prologue l0i([Ljava/lang/String;,d) = 1 int_move l1i(B) = 3 3 int_move l2i(B) = 4 7 int_move l3i(B) = 7 9 EG call l5i(I) AF CF OF PF SF ZF = 66668, static"AdditionMethodTest.getValue (I)I", <unused>, 7 -3 return <unused> -1 bbend BB0 (ENTRY) ********* END OF IR DUMP Initial HIR FOR AdditionMethodTest.main ([Ljava/lang/String;)V ********* START OF IR DUMP Initial HIR FOR AdditionMethodTest.getValue (I)I -13 LABEL0 Frequency: 0.0 -2 EG ir_prologue l0i(I,d) = 2 int_mul t2i(I) = l0i(I,d), l0i(I,d) 3 int_move t1i(I) = t2i(I) -3 return t1i(I) -1 bbend BB0 (ENTRY) ********* END OF IR DUMP Initial HIR FOR AdditionMethodTest.getValue (I)I
  32. 32. Intermediate Representation (contd.): Optimizations for HIR: Following optimizers are provided for the basic optimization. 1.CF 2.CPF 3.CSE 4.DCE 5.GT // Constant Folding // Constant Propagation and Folding (triggered by the propagation) // Common Sub-expression Elimination (within basic blocks) // Dead Code Elimination // Global Variable Temporalization (within basic block) The optimizers CF and GT do not require data flow analysis, however, CPF, CSE and DCE require some result of data flow analysis. Complete Description can be available @ http://www.coins-project.org/international/COINSdoc.en/hiropt/hiropt.html
  33. 33. Intermediate Representation (contd.): 2.Medium-Level IRs (MIR) a) Support range of features in a set of source languages, but in a languageindependent way. b) Good basis for generation of efficient machine code for one or more architectures. Example: register transfer languages 3.Low-Level IRs (LIR) a) Almost one-to-one correspondence to target-machine instructions: quite architecture-dependent. <MIR & LIR to be added>
  34. 34. Optimization Techniques: Why Optimization: 1. Programmers do not always write optimal code. a) For example, ways to improve code are not always recognized (e.g. move loop-invariant code out of loops, avoiding re-computation of the same expression). 2. High-level language may not allow a programmer to avoid redundant computation (or make it inconvenient) a[i][j] = a[i][j] + 1 3. The programmer should not be bothered with the target machine architecture. Moreover, modern machine architectures assume optimization; it has become hard to optimize by hand. Goal: Let programmers write clean, high-level source code, produce programs that approach assembly-code performance. Optimization: the transformation of a program P into a program P´, that has the same input/output behavior, but is somehow “better”. Better might mean: • faster, or • smaller, or • uses less power, or • whatever you care about P´ is not optimal, may even be worse than P.
  35. 35. Optimization Techniques: 1. In-lining (also at lower levels) 2. Specialization 3. Constant folding 4. Constant propagation 5. Value numbering 6. Dead code elimination 7. Loop-invariant code motion 8. Common sub-expression elimination 9. Strength reduction 10. Branch prediction/optimization 11. Register allocation 12. Loop unrolling 13. Cache optimization
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×