Successfully reported this slideshow.

Javascript engine performance

42

Share

Loading in …3
×
1 of 90
1 of 90

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Javascript engine performance

  1. 1. JavaScript Engine Performance
  2. 2. 关于我 • Baidu资深工程师 • 目前主要做性能优化相关的工作 • 参与W3C的“HTML” 和“Web Performance” 工作组 @nwind @nwind
  3. 3. 请注意 • 我不是虚拟机的专家,仅仅是业余兴趣 • 很多内容都经过了简化,实际情况要复杂很多 • 这里面的观点仅代表我个人看法
  4. 4. 大纲 • 虚拟机的基本原理 • JavaScript引擎是如何优化性能的 • V8、Dart、Node.js的介绍 • 如何编写高性能的JavaScript代码
  5. 5. VM basic
  6. 6. Virtual Machine history • pascal 1970 • smalltalk 1980 • self 1986 • python 1991 • java 1995 • javascript 1995
  7. 7. Smalltalk的演示展现了三项惊人的成果。包括电脑之间如何实现 联网,以及面向对象编程是如何工作的。 但乔布斯和他的团队对这些并不感兴趣,因为他们的注意力被...
  8. 8. How Virtual Machine Work? • Parser • Intermediate Representation • Interpreter, JIT • Runtime, Garbage Collection
  9. 9. Parser • Tokenize • AST
  10. 10. Tokenize identifier number keyword var foo = 10; semicolon equal
  11. 11. AST Assign Variable foo Constant 10
  12. 12. Intermediate Representation • Bytecode • Stack vs. register
  13. 13. Bytecode (SpiderMonkey) 00000: deffun 0 null 00005: nop 00006: callvar 0 function foo(bar) { 00009: int8 2 00011: call 1 return bar + 1; 00014: pop } 00015: stop foo: foo(2); 00020: getarg 0 00023: one 00024: add 00025: return 00026: stop
  14. 14. Bytecode (JSC) 8 m_instructions; 168 bytes at 0x7fc1ba3070e0; 1 parameter(s); 10 callee register(s) [ 0] enter [ 1] mov! ! r0, undefined(@k0) [ 4] get_global_var! r1, 5 [ 7] mov! ! r2, undefined(@k0) function foo(bar) { [ [ 10] 13] mov! ! call!! r3, 2(@k1) r1, 2, 10 return bar + 1; [ [ 17] 19] op_call_put_result! ! end! ! r0 r0 } Constants: k0 = undefined k1 = 2 foo(2); 3 m_instructions; 64 bytes at 0x7fc1ba306e80; 2 parameter(s); 1 callee register(s) [ 0] enter [ 1] add! ! r0, r-7, 1(@k0) [ 6] ret! ! r0 Constants: k0 = 1 End: 3
  15. 15. Stack vs. register • Stack • JVM, .NET, PHP, Python, Old JavaScript engine • Register • Lua, Dalvik, Modern JavaScript engine • Smaller, Faster (about 20%~30%) • RISC
  16. 16. Stack vs. register local a,t,i 1: PUSHNIL 3 a=a+i 2: GETLOCAL 0 ; a 3: GETLOCAL 2 ; i 4: ADD local a,t,i 1: LOADNIL 0 2 0 5: SETLOCAL 0 ; a a=a+i 2: ADD 0 0 2 a=a+1 6: SETLOCAL 0 ; a a=a+1 3: ADD 0 0 250 ; a 7: ADDI 1 a=t[i] 4: GETTABLE 0 1 2 8: SETLOCAL 0 ; a a=t[i] 9: GETLOCAL 1 ; t 10: GETINDEXED 2 ; i 11: SETLOCAL 0 ; a
  17. 17. Interpreter • Switch statement • Direct threading, Indirect threading, Token threading ...
  18. 18. Switch statement while (true) { ! switch (opcode) { ! ! case ADD: ! ! ! ... ! ! ! break; ! ! case SUB: ! ! ! ... ! ! ! break; ... !} }
  19. 19. Direct threading typedef void *Inst; Inst program[] = { &&ADD, &&SUB }; Inst *ip = program; goto *ip++; ADD: ... goto *ip++; SUB: ... http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
  20. 20. Threaded Code
  21. 21. http://en.wikipedia.org/wiki/File:Pipeline,_4_stage.svg
  22. 22. Context Threading Essence of our Solution … CTT - Context iload_1 Threading Table Bytecode bodies iload_1 (generated code) (ret terminated) iadd call iload_1 iload_1: istore_1 iload_1 call iload_1 .. bipush 64 call iadd ret; if_icmplt 2 call istore_1 … call iload_1 iadd: .. .. ret; Return Branch Predictor Stack Package bodies as subroutines andtechnique for virtual machine interpreters Context Threading: A flexible and efficient dispatch call them
  23. 23. Garbage Collection • Reference counting (php, python ...), Smart pointer • Tracing • Generational • Stop-the-world, Concurrent, Incremental • Copying, Sweep, Compact
  24. 24. Why JavaScript is slow? • Dynamic Type • Weak Type • Need to parse every time • GC
  25. 25. Fight with Weak Type
  26. 26. Object model in most VM typedef union { void *p; double d; long l; } Value; typedef struct { unsigned char type; Value value; } Object; Object a;
  27. 27. Tagged pointer
  28. 28. 在几乎所有系统中,指针地址会对齐 (4或8字节) http://www.gnu.org/s/libc/manual/html_node/Aligned-Memory-Blocks.html
  29. 29. 这意味着 0xc00ab958 指针的最后2或3个位⼀一定是0 可以在最后⼀一位加1来表示指针 1 0 0 1 1 0 0 0 9 8 Pointer Small Number
  30. 30. Tagged pointer Memory ... var a = 1 2 var b = {a:1} 0x3d2aa00 ... ... object b ...
  31. 31. Small Number 2 − 1 = 1073741823 30 −2 = −1073741824 30 31位能表示十亿,对大部分应用来说足够了
  32. 32. External Fixed Typed Array • Strong type, Fixed length • Out of VM heap • Example: Int32Array, Float64Array
  33. 33. Small Number + Typed Array Seconds (smaller is better) 4200 5000 4020 3750 3180 2500 40x 1250 50 70 80 0 C/C++ Java(HotSpot) V8 PHP Ruby Python http://shootout.alioth.debian.org/u32/performance.php?test=fannkuchredux
  34. 34. Warning: Benchmark lies
  35. 35. ES6 will have struct
  36. 36. ES6 StructType Point2D = new StructType({ Color = new StructType({ ! x: uint32, ! r: uint8, ! y: uint32 ! g: uint8, }); ! b: uint8 }); Pixel = new StructType({ ! point: Point2D, ! color: Color });
  37. 37. Use typed array to run faster
  38. 38. Fight with Dynamic Type
  39. 39. foo.bar
  40. 40. foo.bar in C movl 4(%edx), %ecx //get movl %ecx, 4(%edx) //put
  41. 41. foo.bar in JavaScript found = HashTable.FindEntry(key) if (found) return found; for (pt = GetPrototype(); pt != null; pt = pt.GetPrototype()) { found = pt.HashTable.FindEntry(key) if (found) return found; }
  42. 42. How to optimize?
  43. 43. First, We need to know Object layout
  44. 44. Add Type for object add property y add property x http://code.google.com/apis/v8/design.html
  45. 45. Inline Cache • Slow lookup at first time • Modify the JIT code in-place • Next time will directly jump to the address
  46. 46. Inline cache make simple return foo.lookupProperty(bar); function fun(foo) { return foo.bar; } if (foo[hiddenClass] == 0xfe1) { return foo[indexOf_bar]; } return foo.lookupProperty(bar);
  47. 47. 实际代码中的JS并不会那么动态 Delete操作只占了0.1% “An Analysis of the Dynamic Behavior of JavaScript...” 99%的原始类型可以在运行通过静态分析确定 97%的属性访问可以被inline cache “TypeCastor: Demystify Dynamic Typing of JavaScript...”
  48. 48. V8 can’t handle delete yet 20x times slower! http://jsperf.com/test-v8-delete
  49. 49. Avoid alter object property layout
  50. 50. Faster Data Structure & Algorithm
  51. 51. Array push is faster than String concat?
  52. 52. http://jsperf.com/nwind-string-concat-vs-array-push
  53. 53. Why?
  54. 54. other string optimizations • Adaptive string search • Single char, Linear, Boyer-Moore-Horspool • Adaptive ascii and utf-8 • Zero copy sub string
  55. 55. Feel free to use String in modern Engine
  56. 56. Just-In-Time (JIT)
  57. 57. JIT • Method JIT, Trace JIT, Regular expression JIT • Register allocation • Code generation
  58. 58. How JIT work? • mmap, malloc (mprotect) • generate native code • cast (c), reinterpret_cast (c++) • call the function
  59. 59. V8
  60. 60. V8 • Lars Bak • Hidden Class, PICs • Some of Built-in objects are written in JavaScript • Crankshaft • Precise generation GC
  61. 61. Lars Bak • implement VM since 1988 • Beta • Self • JVM (VM architect at Sun) • V8 (Google)
  62. 62. Lines of code (VM only) .cpp/.c .h 500000 110831 375000 250000 70787 359986 63975 125000 224038 80867 8043 15475 135547 120941 108280 42113 83920 44646 0 HotSpot V8 SpiderMonkey JSC Ruby CPython PHP-Zend
  63. 63. Crankshaft
  64. 64. Source code Native Code runtime profiling High-Level IR Low-Level IR Opt Native Code } Crankshaft
  65. 65. Crankshaft • Profiling • Compiler optimization • Generate new JIT code • On-stack replacement • Deoptimize
  66. 66. High-Level IR (Hydrogen) • AST to SSA • Type inference (type feedback from inline cache) • Compiler optimization • Function inline • Loop-invariant code motion, Global value numbering • Eliminate dead phis • ...
  67. 67. Loop-invariant code motion tmp = x + y; for (i = 0; i < n; i++) { for (i = 0; i < n; i++) { a[i] = x + y; a[i] = tmp; } }
  68. 68. Function inline limit for now • big function (large than 600 bytes) • have recursive • have unsupported statements • with, switch • try/catch/finally • ...
  69. 69. Avoid “with”, “switch” and “try” in hot path
  70. 70. Built-in objects written in JS function ArraySort(comparefn) { ... // In-place QuickSort algorithm. // For short (length <= 22) arrays, insertion sort is used for efficiency. if (!IS_SPEC_FUNCTION(comparefn)) { comparefn = function (x, y) { if (x === y) return 0; if (%_IsSmi(x) && %_IsSmi(y)) { return %SmiLexicographicCompare(x, y); } x = ToString(x); y = ToString(y); if (x == y) return 0; else return x < y ? -1 : 1; }; } ... v8/src/array.js
  71. 71. GC • Precise • Stop-the-world • Generation • Incremental (2011-10)
  72. 72. V8 performance
  73. 73. V8 performance
  74. 74. V8 performance Why?
  75. 75. V8 performance Unfair, they are using gmp library
  76. 76. Warning: Benchmark lies
  77. 77. Node.JS • Pros • Cons • Easy to write Async I/O • Lack of great libraries • One language for everything • Large JS is hard to maintain • Maybe Faster than PHP, Python • Easy to have Memory leak (compare to PHP, Erlang) • Bet on JavaScript is safe • Still too youth, unproved
  78. 78. Why Dart? • Build for large application • option type, structured, libraries, tools • Performance • lightweight process like erlang • easy to write a faster vm than javascript
  79. 79. The future of Dart? • It will not replace JS • But it may replace GWT, and become a better choice for Building large front-end application • with great IDE, mature libraries • and some way to communicate with JavaScript
  80. 80. How to make JavaScript faster?
  81. 81. How to make JavaScript faster? • Wait for ES6: StructType, const, WeakMap, yield... • High performance build-in library • WebCL • Embed another language • KL(FabricEngine), GLSL(WebGL) • Wait for Quantum computer :)
  82. 82. Things you can learn also • NaN tagging • Polymorphic Inline Cache • Type Inference • Regex JIT • Runtime optimization • ...
  83. 83. References • The behavior of efficient virtual • Context Threading: A Flexible and machine interpreters on modern Efficient Dispatch Technique for architectures Virtual Machine Interpreters • Virtual Machine Showdown: Stack • Effective Inline-Threaded Versus Registers Interpretation of Java Bytecode Using Preparation Sequences • The implementation of Lua 5.0 • Smalltalk-80: the language and its • Why Is the New Google V8 Engine implementation so Fast?
  84. 84. References • Design of the Java HotSpotTM • LLVM: A Compilation Framework Client Compiler for Java 6 for Lifelong Program Analysis & Transformation • Oracle JRockit: The Definitive Guide • Emscripten: An LLVM-to-JavaScript • Virtual Machines: Versatile Compiler platforms for systems and processes • An Analysis of the Dynamic Behavior of JavaScript Programs • Fast and Precise Hybrid Type Inference for JavaScript
  85. 85. References • Adaptive Optimization for SELF • Design, Implementation, and Evaluation of Optimizations in a • Bytecodes meet Combinators: Just-In-Time Compiler invokedynamic on the JVM • Optimizing direct threaded code by • Context Threading: A Flexible and selective inlining Efficient Dispatch Technique for Virtual Machine Interpreters • Linear scan register allocation • Efficient Implementation of the • Optimizing Invokedynamic Smalltalk-80 System • Threaded Code
  86. 86. References • Why Not a Bytecode VM? • Making the Compilation "Pipeline" Explicit- Dynamic • A Survey of Adaptive Compilation Using Trace Tree Optimization in Virtual Machines Specialization • An Efficient Implementation of • Uniprocessor Garbage Collection SELF, a Dynamically-Typed Techniques Object-Oriented Language Based on Prototypes
  87. 87. References • Representing Type Information in • The Structure and Performance of Dynamically Typed Languages Efficient Interpreters • The Behavior of Efficient Virtual • Know Your Engines: How to Make Machine Interpreters on Modern Your JavaScript Fast Architectures • IE Blog, Chromium Blog, WebKit • Trace-based Just-in-Time Type Blog, Opera Blog, Mozilla Blog, Specialization for Dynamic Wingolog’s Blog, RednaxelaFX’s Languages Blog, David Mandelin’s Blog, Brendan Eich’s Blog...
  88. 88. !ank y"

×