Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Virtual machine and javascript engine


Published on

Introduction to virtual machine and JavaScript engine implement.

Published in: Technology

Virtual machine and javascript engine

  1. Virtual Machine & JavaScript Engine @nwind
  2. (HLL) Virtual Machine
  3. Take the red pillI will show you the rabbit hole.
  4. Virtual Machine history• pascal 1970• smalltalk 1980• self 1986• python 1991• java 1995• javascript 1995
  5. The Smalltalk demonstration showed three amazing features. Onewas how computers could be networked; the second was howobject-oriented programming worked. But Jobs and his team paidlittle attention to these attributes because they were so amazed bythe third feature, ...
  6. How Virtual Machine Work?• Parser• Intermediate Representation (IR)• Interpreter• Garbage Collection• Optimization
  7. Parser• Tokenize• AST
  8. Tokenize identifier numberkeyword var foo = 10; semicolon space equal
  9. AST AssignVariable foo Constant 10
  10. { AST demo (Esprima) "type": "Program", "body": [ { "type": "VariableDeclaration", "declarations": [ { "id": { "type": "Identifier", "name": "foo" }, "init": { "type": "BinaryExpression", var foo = bar + 1; "operator": "+", "left": { "type": "Identifier", "name": "bar" }, "right": { "type": "Literal", "value": 1 } } } ], "kind": "var" }} ]
  11. Intermediate Representation• Bytecode• Stack vs. register
  12. Bytecode (SpiderMonkey) 00000: deffun 0 null 00005: nop 00006: callvar 0 00009: int8 2function foo(bar) { 00011: call 1 return bar + 1; 00014: pop} 00015: stopfoo(2); foo: 00020: getarg 0 00023: one 00024: add 00025: return 00026: stop
  13. Bytecode (JSC) 8 m_instructions; 168 bytes at 0x7fc1ba3070e0; 1 parameter(s); 10 callee register(s) [ 0] enter [ 1] mov! ! r0, undefined(@k0) [ 4] get_global_var! r1, 5 [ 7] mov! ! r2, undefined(@k0) [ 10] mov! ! r3, 2(@k1) [ 13] call!! r1, 2, 10function foo(bar) { [ 17] op_call_put_result! ! r0 return bar + 1; [ 19] end! ! r0} Constants: k0 = undefinedfoo(2); k1 = 2 3 m_instructions; 64 bytes at 0x7fc1ba306e80; 2 parameter(s); 1 callee register(s) [ 0] enter [ 1] add! ! r0, r-7, 1(@k0) [ 6] ret! ! r0 Constants: k0 = 1 End: 3
  14. Stack vs. register• Stack • JVM, .NET, php, python, Old JavaScript engine• Register • Lua, Dalvik, All modern JavaScript engine • Smaller, Faster (about 30%) • RISC
  15. Stack vs. register local a,t,i 1: PUSHNIL 3 a=a+i 2: GETLOCAL 0 ; a 3: GETLOCAL 2 ; i 4: ADDlocal a,t,i 1: LOADNIL 0 2 0 5: SETLOCAL 0 ; aa=a+i 2: ADD 0 0 2 a=a+1 6: SETLOCAL 0 ; aa=a+1 3: ADD 0 0 250 ; a 7: ADDI 1a=t[i] 4: GETTABLE 0 1 2 8: SETLOCAL 0 ; a a=t[i] 9: GETLOCAL 1 ; t 10: GETINDEXED 2 ; i 11: SETLOCAL 0 ; a
  16. Interpreter• Switch statement• Direct threading, Indirect threading, Token threading ...
  17. Switch statementwhile (true) {! switch (opcode) { mov %edx,0xffffffffffffffe4(%rbp)! ! case ADD: cmpl $0x1,0xffffffffffffffe4(%rbp)! ! ! ... je 6e <interpret+0x6e>! ! ! break; cmpl $0x1,0xffffffffffffffe4(%rbp)! ! case SUB: jb 4a <interpret+0x4a>! ! ! ... cmpl $0x2,0xffffffffffffffe4(%rbp)! ! ! break; je 93 <interpret+0x93> ... jmp 22 <interpret+0x22>! } ...}
  18. Direct threadingtypedef void *Inst; mov 0xffffffffffffffe8(%rbp),%rdxInst program[] = { &&ADD, &&SUB }; lea 0xffffffffffffffe8(%rbp),%raxInst *ip = program; addq $0x8,(%rax)goto *ip++; mov %rdx,0xffffffffffffffd8(%rbp) jmpq *0xffffffffffffffd8(%rbp)ADD: ... ADD: goto *ip++; ... mov 0xffffffffffffffe8(%rbp),%rdxSUB: lea 0xffffffffffffffe8(%rbp),%rax ... addq $0x8,(%rax) goto *ip++; mov %rdx,0xffffffffffffffd8(%rbp) jmp 2c <interpreter+0x2c>
  19. Garbage Collection• Reference counting (php, python ...), smart pointer• Tracing • Stop the world • Copying, Mark-and-sweep, Mark-and-compact • Generational GC • Precise vs. conservative
  20. Precise vs. conservative• Conservative • If it looks like a pointer, treat it as a pointer • Might have memory leak • Cant’ move object, have memory fragmentation• Precise • Indirectly vs. Directly reference
  21. It is time for the DARK Magic
  22. Optimization Magic• Interpreter optimization• Compiler optimization• JIT• Type inference• Hidden Type• Method inline, PICs
  23. Interpreter optimization
  24. Switch work inefficient, Why?
  25. CPU Pipeline• Fetch, Decode, Execute, Write-back• Branch prediction
  27. Solution: Inline ThreadingICONST_1_START: *sp++ = 1;ICONST_1_END: goto **(pc++);INEG_START: sp[-1] = -sp[-1];INEG_END: goto **(pc++);DISPATCH_START: goto **(pc++);DISPATCH_END: ;size_t iconst_size = (&&ICONST_1_END - &&ICONST_1_START);size_t ineg_size = (&&INEG_END - &&INEG_START);size_t dispatch_size = (&&DISPATCH_END - &&DISPATCH_START);void *buf = malloc(iconst_size + ineg_size + dispatch_size);void *current = buf;memcpy(current, &&ICONST_START, iconst_size); current += iconst_size;memcpy(current, &&INEG_START, ineg_size); current += ineg_size;memcpy(current, &&DISPATCH_START, dispatch_size);...goto **buf; Interpreter? JIT!
  28. Compiler optimization
  29. Compiler optimization• SSA• Data-flow• Control-flow• Loop• ...
  30. What a JVM can do...compiler tactics language-specific techniques loop transformations delayed compilation class hierarchy analysis loop unrolling Tiered compilation devirtualization loop peeling on-stack replacement symbolic constant propagation safepoint elimination delayed reoptimization autobox elimination iteration range splitting program dependence graph representation escape analysis range check elimination static single assignment representation lock elision loop vectorizationproof-based techniques lock fusion global code shaping exact type inference de-reflection inlining (graph integration) memory value inference speculative (profile-based) techniques global code motion memory value tracking optimistic nullness assertions heat-based code layout constant folding optimistic type assertions switch balancing reassociation optimistic type strengthening throw inlining operator strength reduction optimistic array length strengthening control flow graph transformation null check elimination untaken branch pruning local code scheduling type test strength reduction optimistic N-morphic inlining local code bundling type test elimination branch frequency prediction delay slot filling algebraic simplification call frequency prediction graph-coloring register allocation common subexpression elimination memory and placement transformation linear scan register allocation integer range typing expression hoisting live range splittingflow-sensitive rewrites expression sinking copy coalescing conditional constant propagation redundant store elimination constant splitting dominating test detection adjacent store fusion copy removal flow-carried type narrowing card-mark elimination address mode matching dead code elimination merge-point splitting instruction peepholing DFA-based code generator
  31. Just-In-Time (JIT)
  32. JIT• Method JIT, Trace JIT, Regular expression JIT• Code generation• Register allocation
  33. How JIT work?• mmap/new/malloc (mprotect)• generate native code• c cast/reinterpret_cast• call the function
  34. Trampoline (JSC x86) asm ( ".textn" ".globl " SYMBOL_STRING(ctiTrampoline) "n"// Execute the code! HIDE_SYMBOL(ctiTrampoline) "n"inline JSValue execute(RegisterFile* registerFile, SYMBOL_STRING(ctiTrampoline) ":" "n" CallFrame* callFrame, "pushl %ebp" "n" JSGlobalData* globalData) "movl %esp, %ebp" "n"{ "pushl %esi" "n" JSValue result = JSValue::decode( "pushl %edi" "n" ctiTrampoline( "pushl %ebx" "n" m_ref.m_code.executableAddress(), "subl $0x3c, %esp" "n" registerFile, "movl $512, %esi" "n" callFrame, "movl 0x58(%esp), %edi" "n" 0, "call *0x50(%esp)" "n" Profiler::enabledProfilerReference(), "addl $0x3c, %esp" "n" globalData)); "popl %ebx" "n" return globalData->exception ? jsNull() : result; "popl %edi" "n"} "popl %esi" "n" "popl %ebp" "n" "ret" "n" );
  35. Register allocation• Linear scan• Graph coloring
  36. Code generation• Pipelining• SIMD (SSE2, SSE3 ...)• Debug
  37. Type inference
  38. a+b
  39. Property access
  40. “”
  41. in C00001f63!movl! %ecx,0x04(%edx)
  42. __ZN2v88internal7HashMap6LookupEPvjb:00000338! pushl!%ebp00000339! pushl!%ebx0000033a! pushl!%edi0000033b!0000033c!0000033f!00000343!00000346!00000349! pushl!%esi subl! $0x0c,%esp movl! 0x20(%esp),%esi movl! 0x08(%esi),%eax movl! 0x0c(%esi),%ecx imull!$0x0c,%ecx,%edi in JavaScript0000034c! leal! 0xff(%ecx),%ecx0000034f! addl! %eax,%edi00000351! movl! 0x28(%esp),%ebx00000355! andl! %ebx,%ecx00000357! imull!$0x0c,%ecx,%ebp0000035a! addl! %eax,%ebp0000035c! jmp! 0x0000036a0000035e! nop00000360! addl! $0x0c,%ebp00000363! cmpl! %edi,%ebp00000365! jb! 0x0000036a00000367! movl! 0x08(%esi),%ebp0000036a!0000036d!0000036f! movl! 0x00(%ebp),%eax testl!%eax,%eax je! 0x0000038b __ZN2v88internal7HashMap6LookupEPvjb00000371! cmpl! %ebx,0x08(%ebp)00000374! jne! 0x0000036000000376! movl! %eax,0x04(%esp)0000037a!0000037e!00000381! movl! 0x24(%esp),%eax movl! %eax,(%esp) call! *0x04(%esi) means:00000384! testb!%al,%al00000386! je! 0x0000036000000388! movl! 0x00(%ebp),%eax0000038b!0000038d!00000393! testl!%eax,%eax jne! 0x00000418 cmpb! $0x00,0x2c(%esp) v8::internal::HashMap::Lookup(void*, unsigned int, bool)00000398! jne! 0x0000039e0000039a! xorl! %ebp,%ebp0000039c! jmp! 0x000004180000039e! movl! 0x24(%esp),%eax000003a2! movl! %eax,0x00(%ebp)000003a5! movl! $0x00000000,0x04(%ebp)000003ac! movl! %ebx,0x08(%ebp)000003af! movl! 0x10(%esi),%eax000003b2! leal! 0x01(%eax),%ecx000003b5! movl! %ecx,0x10(%esi)000003b8! shrl! $0x02,%ecx000003bb! leal! 0x01(%ecx,%eax),%eax... 27 lines more
  43. How to optimize?
  44. Hidden Type add property x then add property y
  45. But nothing is perfect
  46. one secret in V8 hidden class 20x times slower!
  47. in Figure 5, reads are far more common than writes: over all Write_indx roughly comparable to me 1. Write_prop Read_prop 0.8 traces the proportion of reads to writes is 6 to 1. Deletes comprise Write_hash Read_hash class-based languages, suc Write_indx Read_indx only .1% of all events. That graph further breaks reads, writes Write_prop Read_prop Delet_prop ric discussed in [23]. Studi But property are rarely deleted and deletes into various specific types; prop Delet_hash to accesses refers 0.8 Write_hash Read_hash DIT of 8 and a median of 0.6 Write_indx Read_indx Delet_indx Write_prop Read_prop Delet_prop Define and maximum of 10. Figu Write_hash Read_hash Write_indx Read_indx Delet_hash Delet_indx Create Call median prototype chain le 0.6 10 Write_prop Read_prop Delet_prop Define Throw chain length 1, the minimu 0.4 Write_hash Read_hash Delet_hash Create Catch Write_indx Read_indx Delet_indx Call have at least one prototyp Read_prop Define Object.prototype. The m 1.0 Delet_prop Throw 9 0.4 Read_hash Delet_hash Create Catch is 10. The majority of site 0.2 Read_indx Delet_indx Call Delet_prop Define Throw Delet_hash Create Catch reuse, but this is possibly 8 0.8 Delet_indx Call to achieve code reuse in J 0.2 Define Throw 0.0 Create Catch sures directly into a field o prototypes have similar in 7 Call 280s Fbok Apme Bing Blog Digg Flkr Gmai Gmap Lvly Twit Wiki Goog IShk Word Ebay YTub All* Prototype chain length Throw 0.6 0.4 Flkr 0.0 Catch Only 0.1% delete 5.4 Object Kinds 280s Fbok Gmai Gmap Lvly Twit Wiki Apme Bing Blog Digg Goog IShk Word Ebay YTub All* 6 280S BING BLOG EBAY FBOK DIGG FLKR GMIL GMAP GOGL ISHK LIVE MECM TWIT ALL* WIKI WORD YTUB Figure 7 breaks down the Fbok Bing Blog Digg Flkr Gmai Gmap Lvly Twit Wiki Goog IShk Word Ebay YTub All* into a number of categorie 5 built-in data types: dates (DFbok Gmap Lvly Twit Wiki Flkr Gmai Goog IShk Word Ebay YTub All* 0.2 ument and layout objects 4 rors. The remaining objec Lvly Twit Wiki Goog IShk Word Ebay 0.0 YTub All* mous objects, instances, fu jects are constructed with a 3 Figure 5. Instruction mix. The per-site proportion of read, write, while instances are constr 280S BING BLOG EBAY FBOK LIVE ALL* DIGG FLKR GMIL GMAP GOGL ISHK MECM TWIT WIKI WORD YTUB delete, call instructions (averaged over multiple traces). A function object is creat 2 An Analysis of the Dynamic Behavior ofthe interpreter a uated by JavaScript Programs
  48. Optimize method call
  49. bar can be anythingfunction foo(bar) { return;}
  50. adaptive optimization for self
  51. Polymorphic inline cache
  52. Tagged pointer
  53. Tagged pointertypedef union { void *p; double d; long l;} Value;typedef struct { unsigned char type; sizeof(a)?? Value value;} Object; if everything is object, it will be too much overhead for small integerObject a;
  54. Tagged pointer In almost all system, the pointer address will be aligned (4 or 8 bytes)“The address of a block returned by malloc or realloc in the GNU system isalways a multiple of eight (or sixteen on 64-bit systems). ”
  55. Tagged pointerExample: 0xc00ab958 the pointer’s last 2 or 3 bits must be 0 1 0 0 0 1 0 0 1 8 9 Pointer Small Number
  56. How about double?
  57. NaN-tagging (JSC 64 bit)In 64 bit system, we can only use 48 bits, that means it will have 16 bits are 0 * The top 16-bits denote the type of the encoded JSValue: * * Pointer { 0000:PPPP:PPPP:PPPP * / 0001:****:****:**** * Double { ... * FFFE:****:****:**** * Integer { FFFF:0000:IIII:IIII
  58. V8
  59. V8• Lars Bak• Hidden Class, PICs• Built-in objects written in JavaScript• Crankshaft• Precise generation GC
  60. Lars Bak• implement VM since 1988• Beta• Self• HotSpot
  61. Source code Native Code High-Level IR Low-Level IR Opt Native Code } Crankshaft
  62. Hotspot client compiler
  63. Crankshaft• Profiling• Compiler optimization• On-stack replacement• Deoptimize
  64. High-Level IR (Hydrogen) • function inline • type inference • stack check elimination • loop-invariant code motion • common subexpression elimination • ...
  65. Low-Level IR (Lithium) • linear-scan register allocator • code generate • lazy deoptimization
  66. Built-in objects written in JSfunction ArraySort(comparefn) { if (IS_NULL_OR_UNDEFINED(this) && !IS_UNDETECTABLE(this)) { throw MakeTypeError("called_on_null_or_undefined", ["Array.prototype.sort"]); } // In-place QuickSort algorithm. // For short (length <= 22) arrays, insertion sort is used for efficiency. if (!IS_SPEC_FUNCTION(comparefn)) { comparefn = function (x, y) { if (x === y) return 0; if (%_IsSmi(x) && %_IsSmi(y)) { return %SmiLexicographicCompare(x, y); } x = ToString(x); y = ToString(y); if (x == y) return 0; else return x < y ? -1 : 1; }; } ... v8/src/array.js
  67. GC
  68. V8 performance
  69. Can V8 be faster?
  70. Dart• Clear syntax, Optional types, Libraries• Performance• Can compile to JavaScript• But IE, WebKit and Mozilla rejected it• What do you think? • My thought: Will XML replace HTML? No, but thanks Google, for push the web forward
  71. Embed V8
  72. Embed
  73. Expose Functionv8::Handle<v8::Value> Print(const v8::Arguments& args) { for (int i = 0; i < args.Length(); i++) { v8::HandleScope handle_scope; v8::String::Utf8Value str(args[i]); const char* cstr = ToCString(str); printf("%s", cstr); } return v8::Undefined();}v8::Handle<v8::ObjectTemplate> global = v8::ObjectTemplate::New();global->Set(v8::String::New("print"), v8::FunctionTemplate::New(Print));
  74. Node.JS• Pros • Cons • Async • Lack of great libraries • One language for everything • ES5 code hard to maintain • Faster than PHP, Python • Still too youth • Community
  75. JavaScriptCore (Nitro)
  76. Where it comes from?
  77. 1997 Macworld
  78. “Apple has decided to make Internet Explorer it’s default browseron macintosh.”“Since we believe in choice. We going to be shipping other InternetBrowser...” Steve Jobs
  79. JavaScriptCore History• 2001 KJS (kde-2.2) • 2008 SquirrelFish Extreme • Bison • PICs • AST interpreter • method JIT• 2008 SquirrelFish • regular expression JIT • Bytecode(Register) • DFG JIT (March 2011) • Direct threading
  80. InterpreterAST Bytecode Method JIT SSA DFG JIT
  81. SipderMonkey
  82. Monkey• SpiderMonkey • JägerMonkey • Written by Brendan Eich • PICs • interpreter • method JIT (from JSC)• TraceMonkey • IonMonkey • trace JIT • Type Inference • removed • Compiler optimization
  83. IonMonkey• SSA• function inline• linear-scan register allocation• dead code elimination• loop-invariant code motion• ...
  85. Chakra (IE9)
  86. Chakra• Interpreter/JIT• Type System (hidden class)• PICs• Delay parse• Use utf-8 internal
  87. Unlocking the JavaScript Opportunity with Internet Explorer 9
  88. Unlocking the JavaScript Opportunity with Internet Explorer 9
  89. Carakan (Opera)
  90. Carakan• Register VM• Method JIT, Regex JIT• Hidden type• Function inline
  91. Rhino and JVM
  92. Rhino is SLOW, why?
  93. Because JVM is slow?
  94. JVM did’t support dynamic language well
  95. Solution: invokedynamic
  96. Hard to optimize in JVMBefore Caller Some tricks Method InvokedynamicAfter Caller Method method handle
  97. One ring to rule them all?
  98. Rhino + invokedynamic• Pros • Cons • Easier to implement • Only in JVM7 • Lots of great Java Libraries • Not fully optimized yet • JVM optimization for free • Hard to beat V8
  99. Compiler optimization is HARD
  100. It there an easy way?
  101. LLVM
  102. LLVM• Clang, VMKit, GHC, PyPy, Rubinius ...• DragonEgg: replace GCC back-end• IR• Optimization• Link, Code generate, JIT• Apple
  103. LLVM simplify
  104. define i32 @foo(i32 %bar) nounwind ssp { entry: %bar_addr = alloca i32, align 4 %retval = alloca i32 %0 = alloca i32 %one = alloca i32 %"alloca point" = bitcast i32 0 to i32 store i32 %bar, i32* %bar_addr store i32 1, i32* %one, align 4 %1 = load i32* %bar_addr, align 4 %2 = load i32* %one, align 4 %3 = add nsw i32 %1, %2 store i32 %3, i32* %0, align 4int foo(int bar) { %4 = load i32* %0, align 4 int one = 1; store i32 %4, i32* %retval, align 4 return bar + one; br label %return} return:int main() { %retval1 = load i32* %retval foo(3); ret i32 %retval1} } define i32 @main() nounwind ssp { entry: %retval = alloca i32 %"alloca point" = bitcast i32 0 to i32 %0 = call i32 @foo(i32 3) nounwind ssp br label %return return: %retval1 = load i32* %retval ret i32 %retval1 }
  105. define i32 @foo(i32 %bar) nounwind ssp {entry: %bar_addr = alloca i32, align 4 %retval = alloca i32 %0 = alloca i32 %one = alloca i32 %"alloca point" = bitcast i32 0 to i32 store i32 %bar, i32* %bar_addr store i32 1, i32* %one, align 4 %1 = load i32* %bar_addr, align 4 %2 = load i32* %one, align 4 %3 = add nsw i32 %1, %2 define i32 @foo(i32 %bar) nounwind readnone ssp { store i32 %3, i32* %0, align 4 entry: %4 = load i32* %0, align 4 %0 = add nsw i32 %bar, 1 store i32 %4, i32* %retval, align 4 ret i32 %0 br label %return }return: define i32 @main() nounwind readnone ssp { %retval1 = load i32* %retval entry:} ret i32 %retval1 Optimization } ret i32 undefdefine i32 @main() nounwind ssp {entry: %retval = alloca i32 %"alloca point" = bitcast i32 0 to i32 %0 = call i32 @foo(i32 3) nounwind ssp br label %returnreturn: %retval1 = load i32* %retval ret i32 %retval1}
  106. Optimization (70+)
  107. define i32 @foo(i32 %bar) nounwind readnone ssp {entry: %0 = add nsw i32 %bar, 1 ret i32 %0} LLVM backenddefine i32 @main() nounwind readnone ssp {entry: ret i32 undef}
  108. exe & Libraries LLVM LLVM exe & Offline Reoptimizer LLVM Compiler FE 1 LLVM Native exe Profile . CPU Info LLVM Linker CodeGen Profile & Trace . .o files IPO/IPA LLVM exe Info Runtime Compiler FE N JIT LLVM Optimizer LLVM LLVM Figure 4: LLVM system architecture diagramcode in non-conforming languages is executed as “un-managed code”. Such code is represented in native External static LLVM compilers (referred to as front-eform and not in the CLI intermediate representation, translate source-language programs into the LLVM virso it is not exposed to CLI optimizations. These sys- instruction set. Each static compiler can perform threetems do not provide #2 with #1 or #3 because run- tasks, of which the first and third are optional: (1) Pertime optimization is generally only possible when us- language-specific optimizations, e.g., optimizing closureing JIT code generation. They do not aim to provide languages with higher-order functions. (2) Translate so
  109. LLVM on JavaScript
  110. Emscripten• C/C++ to LLVM IR• LLVM IR to JavaScript• Run on browser
  111. ... function _foo($bar) {define i32 @foo(i32 %bar) nounwind readnone ssp { var __label__;entry: var $0=((($bar)+1)|0); %0 = add nsw i32 %bar, 1 return $0; ret i32 %0 }} function _main() {define i32 @main() nounwind readnone ssp { var __label__;entry: return undef; ret i32 undef }} Module["_main"] = _main; ...
  112. Emscripten demo• Python, Ruby, Lua virtual machine (• OpenJPEG• Poppler• FreeType• ...
  113. Performance? good enough! benchmark SM V8 gcc ratio two Ja fannkuch (10) 1.158 0.931 0.231 4.04 benchm fasta (2100000) 1.115 1.128 0.452 2.47 operati primes 1.443 3.194 0.438 3.29 code th raytrace (7,256) 1.930 2.944 0.228 8.46 to usin dlmalloc (400,400) 5.050 1.880 0.315 5.97 (The m ‘nativiz The first column is the name of the benchmark, and in Beinparentheses any parameters used in running it. The source C++ co
  114. JavaScript on LLVM
  115. Fabric Engine• JavaScript Integration• Native code compilation (LLVM)• Multi-threaded execution• OpenGL Rendering
  116. Fabric Engine
  117. Conclusion?
  118. All problems in computer science can be solvedby another level of indirection David Wheeler
  119. References• The behavior of efficient virtual • Context Threading: A Flexible machine interpreters on and Efficient Dispatch modern architectures Technique for Virtual Machine Interpreters• Virtual Machine Showdown: Stack Versus Registers • Effective Inline-Threaded Interpretation of Java Bytecode• The implementation of Lua 5.0 Using Preparation Sequences• Why Is the New Google V8 • Smalltalk-80: the language and Engine so Fast? its implementation
  120. References• Design of the Java HotSpotTM • LLVM: A Compilation Client Compiler for Java 6 Framework for Lifelong Program Analysis &• Oracle JRockit: The Definitive Transformation Guide • Emscripten: An LLVM-to-• Virtual Machines: Versatile JavaScript Compiler platforms for systems and processes • An Analysis of the Dynamic Behavior of JavaScript• Fast and Precise Hybrid Type Programs Inference for JavaScript
  121. References• Adaptive Optimization for SELF • Design, Implementation, and Evaluation of Optimizations in a• Bytecodes meet Combinators: Just-In-Time Compiler invokedynamic on the JVM • Optimizing direct threaded• Context Threading: A Flexible code by selective inlining and Efficient Dispatch Technique for Virtual Machine • Linear scan register allocation Interpreters • Optimizing Invokedynamic• Efficient Implementation of the Smalltalk-80 System
  122. References• Representing Type Information • The Structure and Performance in Dynamically Typed of Efficient Interpreters Languages • Know Your Engines: How to• The Behavior of Efficient Virtual Make Your JavaScript Fast Machine Interpreters on Modern Architectures • IE Blog, Chromium Blog, WebKit Blog, Opera Blog,• Trace-based Just-in-Time Type Mozilla Blog, Wingolog’s Blog, Specialization for Dynamic RednaxelaFX’s Blog, David Languages Mandelin’s Blog...
  123. !ank y"