JavaScript Engine  Performance
关于我•   Baidu资深工程师•   目前主要做性能优化相关的工作•   参与W3C的“HTML” 和“Web Performance” 工作组             @nwind                 @nwind
请注意•   我不是虚拟机的专家,仅仅是业余兴趣•   很多内容都经过了简化,实际情况要复杂很多•   这里面的观点仅代表我个人看法
大纲•   虚拟机的基本原理•   JavaScript引擎是如何优化性能的•   V8、Dart、Node.js的介绍•   如何编写高性能的JavaScript代码
VM basic
Virtual Machine history•   pascal 1970•   smalltalk 1980•   self 1986•   python 1991•   java 1995•   javascript 1995
Smalltalk的演示展现了三项惊人的成果。包括电脑之间如何实现联网,以及面向对象编程是如何工作的。但乔布斯和他的团队对这些并不感兴趣,因为他们的注意力被...
How Virtual Machine Work?•   Parser•   Intermediate Representation•   Interpreter, JIT•   Runtime, Garbage Collection
Parser•   Tokenize•   AST
Tokenize              identifier           numberkeyword          var foo = 10;                    semicolon               ...
AST               AssignVariable foo            Constant 10
Intermediate Representation•   Bytecode•   Stack vs. register
Bytecode (SpiderMonkey)                      00000:   deffun 0 null                      00005:   nop                     ...
Bytecode (JSC)                      8 m_instructions; 168 bytes at 0x7fc1ba3070e0;                      1 parameter(s); 10...
Stack vs. register•   Stack    •   JVM, .NET, PHP, Python, Old JavaScript engine•   Register    •   Lua, Dalvik, Modern Ja...
Stack vs. registerlocal a,t,i    1:   PUSHNIL      3a=a+i          2:   GETLOCAL     0 ; a               3:   GETLOCAL    ...
Interpreter•   Switch statement•   Direct threading, Indirect threading, Token threading ...
Switch statement while (true) { ! switch (opcode) { ! ! case ADD: ! ! ! ... ! ! ! break; ! ! case SUB: ! ! ! ... ! ! ! bre...
Direct threadingtypedef void *Inst;Inst program[] = { &&ADD, &&SUB };Inst *ip = program;goto *ip++;ADD:      ...      goto...
Threaded Code
http://en.wikipedia.org/wiki/File:Pipeline,_4_stage.svg
Context Threading          Essence of our Solution…                      CTT - Contextiload_1                Threading Tab...
Garbage Collection•   Reference counting (php, python ...), Smart pointer•   Tracing    •   Generational    •   Stop-the-w...
Why JavaScript is slow?•   Dynamic Type•   Weak Type•   Need to parse every time•   GC
Fight with Weak Type
Object model in most VM     typedef union {       void *p;       double d;       long l;     } Value;     typedef struct {...
Tagged pointer
在几乎所有系统中,指针地址会对齐 (4或8字节)         http://www.gnu.org/s/libc/manual/html_node/Aligned-Memory-Blocks.html
这意味着0xc00ab958               指针的最后2或3个位⼀一定是0             可以在最后⼀一位加1来表示指针       1     0   0   1      1   0   0   0         ...
Tagged pointer                Memory                    ...var a = 1           2var b = {a:1}   0x3d2aa00                 ...
Small Number2 − 1 = 1073741823 30−2 = −1073741824  30 31位能表示十亿,对大部分应用来说足够了
External Fixed Typed Array•   Strong type, Fixed length•   Out of VM heap•   Example: Int32Array, Float64Array
Small Number + Typed Array                               Seconds (smaller is better)                                      ...
Warning: Benchmark       lies
ES6 will have struct
ES6 StructTypePoint2D = new StructType({   Color = new StructType({! x: uint32,                 ! r: uint8,! y: uint32    ...
Use typed array to run faster
Fight with Dynamic       Type
foo.bar
foo.bar in Cmovl 4(%edx), %ecx   //getmovl %ecx, 4(%edx)   //put
foo.bar in JavaScriptfound = HashTable.FindEntry(key)if (found) return found;for (pt = GetPrototype();       pt != null;  ...
How to optimize?
First, We need to know     Object layout
Add Type for object                      add property yadd property x                     http://code.google.com/apis/v8/d...
Inline Cache•   Slow lookup at first time•   Modify the JIT code in-place•   Next time will directly jump to the address
Inline cache make simple                      return foo.lookupProperty(bar);function fun(foo) {    return foo.bar;}      ...
实际代码中的JS并不会那么动态Delete操作只占了0.1%                     “An Analysis of the Dynamic Behavior of JavaScript...”99%的原始类型可以在运行通过静态...
V8 can’t handle delete yet                                         20x times                                          slow...
Avoid alter object property          layout
Faster Data Structure    & Algorithm
Array push is fasterthan String concat?
http://jsperf.com/nwind-string-concat-vs-array-push
Why?
other string optimizations•   Adaptive string search    •   Single char, Linear, Boyer-Moore-Horspool•   Adaptive ascii an...
Feel free to use String in     modern Engine
Just-In-Time (JIT)
JIT•   Method JIT, Trace JIT, Regular expression JIT•   Register allocation•   Code generation
How JIT work?•   mmap, malloc (mprotect)•   generate native code•   cast (c), reinterpret_cast (c++)•   call the function
V8
V8•   Lars Bak•   Hidden Class, PICs•   Some of Built-in objects are written in JavaScript•   Crankshaft•   Precise genera...
Lars Bak•   implement VM since 1988•   Beta•   Self•   JVM (VM architect at Sun)•   V8 (Google)
Lines of code (VM only)                             .cpp/.c                .h500000         110831375000250000            ...
Crankshaft
Source code        Native Coderuntime profiling                   High-Level IR    Low-Level IR   Opt Native Code          ...
Crankshaft•   Profiling•   Compiler optimization•   Generate new JIT code•   On-stack replacement•   Deoptimize
High-Level IR (Hydrogen)•   AST to SSA•   Type inference (type feedback from inline cache)•   Compiler optimization    •  ...
Loop-invariant code motion                            tmp = x + y;for (i = 0; i < n; i++) {   for (i = 0; i < n; i++) {   ...
Function inline limit for now•   big function (large than 600 bytes)•   have recursive•   have unsupported statements    •...
Avoid “with”, “switch” and    “try” in hot path
Built-in objects written in JS   function ArraySort(comparefn) {     ...     // In-place QuickSort algorithm.     // For s...
GC•   Precise•   Stop-the-world•   Generation•   Incremental (2011-10)
V8 performance
V8 performance
V8 performance     Why?
V8 performanceUnfair, they are using gmp library
Warning: Benchmark       lies
Node.JS•   Pros                                •   Cons    •   Easy to write Async I/O             •   Lack of great libra...
Why Dart?•   Build for large application    •   option type, structured, libraries, tools•   Performance    •   lightweigh...
The future of Dart?•   It will not replace JS•   But it may replace GWT, and become a better choice for    Building large ...
How to makeJavaScript faster?
How to make JavaScript faster? •   Wait for ES6: StructType, const, WeakMap, yield... •   High performance build-in librar...
Things you can learn also•   NaN tagging•   Polymorphic Inline Cache•   Type Inference•   Regex JIT•   Runtime optimizatio...
References•   The behavior of efficient virtual   •   Context Threading: A Flexible and    machine interpreters on modern   ...
References•   Design of the Java HotSpotTM          •   LLVM: A Compilation Framework    Client Compiler for Java 6       ...
References•   Adaptive Optimization for SELF      •   Design, Implementation, and                                         ...
References•   Why Not a Bytecode VM?             •   Making the Compilation                                           "Pip...
References•   Representing Type Information in   •   The Structure and Performance of    Dynamically Typed Languages      ...
!ank y"
Javascript engine performance
Javascript engine performance
Upcoming SlideShare
Loading in...5
×

Javascript engine performance

6,057

Published on

Published in: Technology
1 Comment
37 Likes
Statistics
Notes
  • nice job :)
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
6,057
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
219
Comments
1
Likes
37
Embeds 0
No embeds

No notes for slide

Javascript engine performance

  1. 1. JavaScript Engine Performance
  2. 2. 关于我• Baidu资深工程师• 目前主要做性能优化相关的工作• 参与W3C的“HTML” 和“Web Performance” 工作组 @nwind @nwind
  3. 3. 请注意• 我不是虚拟机的专家,仅仅是业余兴趣• 很多内容都经过了简化,实际情况要复杂很多• 这里面的观点仅代表我个人看法
  4. 4. 大纲• 虚拟机的基本原理• JavaScript引擎是如何优化性能的• V8、Dart、Node.js的介绍• 如何编写高性能的JavaScript代码
  5. 5. VM basic
  6. 6. Virtual Machine history• pascal 1970• smalltalk 1980• self 1986• python 1991• java 1995• javascript 1995
  7. 7. Smalltalk的演示展现了三项惊人的成果。包括电脑之间如何实现联网,以及面向对象编程是如何工作的。但乔布斯和他的团队对这些并不感兴趣,因为他们的注意力被...
  8. 8. How Virtual Machine Work?• Parser• Intermediate Representation• Interpreter, JIT• Runtime, Garbage Collection
  9. 9. Parser• Tokenize• AST
  10. 10. Tokenize identifier numberkeyword var foo = 10; semicolon equal
  11. 11. AST AssignVariable foo Constant 10
  12. 12. Intermediate Representation• Bytecode• Stack vs. register
  13. 13. Bytecode (SpiderMonkey) 00000: deffun 0 null 00005: nop 00006: callvar 0function foo(bar) { 00009: int8 2 00011: call 1 return bar + 1; 00014: pop} 00015: stop foo:foo(2); 00020: getarg 0 00023: one 00024: add 00025: return 00026: stop
  14. 14. Bytecode (JSC) 8 m_instructions; 168 bytes at 0x7fc1ba3070e0; 1 parameter(s); 10 callee register(s) [ 0] enter [ 1] mov! ! r0, undefined(@k0) [ 4] get_global_var! r1, 5 [ 7] mov! ! r2, undefined(@k0)function foo(bar) { [ [ 10] 13] mov! ! call!! r3, 2(@k1) r1, 2, 10 return bar + 1; [ [ 17] 19] op_call_put_result! ! end! ! r0 r0} Constants: k0 = undefined k1 = 2foo(2); 3 m_instructions; 64 bytes at 0x7fc1ba306e80; 2 parameter(s); 1 callee register(s) [ 0] enter [ 1] add! ! r0, r-7, 1(@k0) [ 6] ret! ! r0 Constants: k0 = 1 End: 3
  15. 15. Stack vs. register• Stack • JVM, .NET, PHP, Python, Old JavaScript engine• Register • Lua, Dalvik, Modern JavaScript engine • Smaller, Faster (about 20%~30%) • RISC
  16. 16. Stack vs. registerlocal a,t,i 1: PUSHNIL 3a=a+i 2: GETLOCAL 0 ; a 3: GETLOCAL 2 ; i 4: ADD local a,t,i 1: LOADNIL 0 2 0 5: SETLOCAL 0 ; a a=a+i 2: ADD 0 0 2a=a+1 6: SETLOCAL 0 ; a a=a+1 3: ADD 0 0 250 ; a 7: ADDI 1 a=t[i] 4: GETTABLE 0 1 2 8: SETLOCAL 0 ; aa=t[i] 9: GETLOCAL 1 ; t 10: GETINDEXED 2 ; i 11: SETLOCAL 0 ; a
  17. 17. Interpreter• Switch statement• Direct threading, Indirect threading, Token threading ...
  18. 18. Switch statement while (true) { ! switch (opcode) { ! ! case ADD: ! ! ! ... ! ! ! break; ! ! case SUB: ! ! ! ... ! ! ! break; ... !} }
  19. 19. Direct threadingtypedef void *Inst;Inst program[] = { &&ADD, &&SUB };Inst *ip = program;goto *ip++;ADD: ... goto *ip++;SUB: ...http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
  20. 20. Threaded Code
  21. 21. http://en.wikipedia.org/wiki/File:Pipeline,_4_stage.svg
  22. 22. Context Threading Essence of our Solution… CTT - Contextiload_1 Threading Table Bytecode bodiesiload_1 (generated code) (ret terminated)iadd call iload_1 iload_1:istore_1iload_1 call iload_1 ..bipush 64 call iadd ret;if_icmplt 2 call istore_1… call iload_1 iadd: .. .. ret; Return Branch Predictor Stack Package bodies as subroutines andtechnique for virtual machine interpreters Context Threading: A flexible and efficient dispatch call them
  23. 23. Garbage Collection• Reference counting (php, python ...), Smart pointer• Tracing • Generational • Stop-the-world, Concurrent, Incremental • Copying, Sweep, Compact
  24. 24. Why JavaScript is slow?• Dynamic Type• Weak Type• Need to parse every time• GC
  25. 25. Fight with Weak Type
  26. 26. Object model in most VM typedef union { void *p; double d; long l; } Value; typedef struct { unsigned char type; Value value; } Object; Object a;
  27. 27. Tagged pointer
  28. 28. 在几乎所有系统中,指针地址会对齐 (4或8字节) http://www.gnu.org/s/libc/manual/html_node/Aligned-Memory-Blocks.html
  29. 29. 这意味着0xc00ab958 指针的最后2或3个位⼀一定是0 可以在最后⼀一位加1来表示指针 1 0 0 1 1 0 0 0 9 8 Pointer Small Number
  30. 30. Tagged pointer Memory ...var a = 1 2var b = {a:1} 0x3d2aa00 ... ... object b ...
  31. 31. Small Number2 − 1 = 1073741823 30−2 = −1073741824 30 31位能表示十亿,对大部分应用来说足够了
  32. 32. External Fixed Typed Array• Strong type, Fixed length• Out of VM heap• Example: Int32Array, Float64Array
  33. 33. Small Number + Typed Array Seconds (smaller is better) 42005000 40203750 31802500 40x1250 50 70 80 0 C/C++ Java(HotSpot) V8 PHP Ruby Python http://shootout.alioth.debian.org/u32/performance.php?test=fannkuchredux
  34. 34. Warning: Benchmark lies
  35. 35. ES6 will have struct
  36. 36. ES6 StructTypePoint2D = new StructType({ Color = new StructType({! x: uint32, ! r: uint8,! y: uint32 ! g: uint8,}); ! b: uint8 }); Pixel = new StructType({ ! point: Point2D, ! color: Color });
  37. 37. Use typed array to run faster
  38. 38. Fight with Dynamic Type
  39. 39. foo.bar
  40. 40. foo.bar in Cmovl 4(%edx), %ecx //getmovl %ecx, 4(%edx) //put
  41. 41. foo.bar in JavaScriptfound = HashTable.FindEntry(key)if (found) return found;for (pt = GetPrototype(); pt != null; pt = pt.GetPrototype()) { found = pt.HashTable.FindEntry(key) if (found) return found;}
  42. 42. How to optimize?
  43. 43. First, We need to know Object layout
  44. 44. Add Type for object add property yadd property x http://code.google.com/apis/v8/design.html
  45. 45. Inline Cache• Slow lookup at first time• Modify the JIT code in-place• Next time will directly jump to the address
  46. 46. Inline cache make simple return foo.lookupProperty(bar);function fun(foo) { return foo.bar;} if (foo[hiddenClass] == 0xfe1) { return foo[indexOf_bar]; } return foo.lookupProperty(bar);
  47. 47. 实际代码中的JS并不会那么动态Delete操作只占了0.1% “An Analysis of the Dynamic Behavior of JavaScript...”99%的原始类型可以在运行通过静态分析确定97%的属性访问可以被inline cache “TypeCastor: Demystify Dynamic Typing of JavaScript...”
  48. 48. V8 can’t handle delete yet 20x times slower! http://jsperf.com/test-v8-delete
  49. 49. Avoid alter object property layout
  50. 50. Faster Data Structure & Algorithm
  51. 51. Array push is fasterthan String concat?
  52. 52. http://jsperf.com/nwind-string-concat-vs-array-push
  53. 53. Why?
  54. 54. other string optimizations• Adaptive string search • Single char, Linear, Boyer-Moore-Horspool• Adaptive ascii and utf-8• Zero copy sub string
  55. 55. Feel free to use String in modern Engine
  56. 56. Just-In-Time (JIT)
  57. 57. JIT• Method JIT, Trace JIT, Regular expression JIT• Register allocation• Code generation
  58. 58. How JIT work?• mmap, malloc (mprotect)• generate native code• cast (c), reinterpret_cast (c++)• call the function
  59. 59. V8
  60. 60. V8• Lars Bak• Hidden Class, PICs• Some of Built-in objects are written in JavaScript• Crankshaft• Precise generation GC
  61. 61. Lars Bak• implement VM since 1988• Beta• Self• JVM (VM architect at Sun)• V8 (Google)
  62. 62. Lines of code (VM only) .cpp/.c .h500000 110831375000250000 70787 359986 63975125000 224038 80867 8043 15475 135547 120941 108280 42113 83920 44646 0 HotSpot V8 SpiderMonkey JSC Ruby CPython PHP-Zend
  63. 63. Crankshaft
  64. 64. Source code Native Coderuntime profiling High-Level IR Low-Level IR Opt Native Code } Crankshaft
  65. 65. Crankshaft• Profiling• Compiler optimization• Generate new JIT code• On-stack replacement• Deoptimize
  66. 66. High-Level IR (Hydrogen)• AST to SSA• Type inference (type feedback from inline cache)• Compiler optimization • Function inline • Loop-invariant code motion, Global value numbering • Eliminate dead phis • ...
  67. 67. Loop-invariant code motion tmp = x + y;for (i = 0; i < n; i++) { for (i = 0; i < n; i++) { a[i] = x + y; a[i] = tmp;} }
  68. 68. Function inline limit for now• big function (large than 600 bytes)• have recursive• have unsupported statements • with, switch • try/catch/finally • ...
  69. 69. Avoid “with”, “switch” and “try” in hot path
  70. 70. Built-in objects written in JS function ArraySort(comparefn) { ... // In-place QuickSort algorithm. // For short (length <= 22) arrays, insertion sort is used for efficiency. if (!IS_SPEC_FUNCTION(comparefn)) { comparefn = function (x, y) { if (x === y) return 0; if (%_IsSmi(x) && %_IsSmi(y)) { return %SmiLexicographicCompare(x, y); } x = ToString(x); y = ToString(y); if (x == y) return 0; else return x < y ? -1 : 1; }; } ... v8/src/array.js
  71. 71. GC• Precise• Stop-the-world• Generation• Incremental (2011-10)
  72. 72. V8 performance
  73. 73. V8 performance
  74. 74. V8 performance Why?
  75. 75. V8 performanceUnfair, they are using gmp library
  76. 76. Warning: Benchmark lies
  77. 77. Node.JS• Pros • Cons • Easy to write Async I/O • Lack of great libraries • One language for everything • Large JS is hard to maintain • Maybe Faster than PHP, Python • Easy to have Memory leak (compare to PHP, Erlang) • Bet on JavaScript is safe • Still too youth, unproved
  78. 78. Why Dart?• Build for large application • option type, structured, libraries, tools• Performance • lightweight process like erlang • easy to write a faster vm than javascript
  79. 79. The future of Dart?• It will not replace JS• But it may replace GWT, and become a better choice for Building large front-end application • with great IDE, mature libraries • and some way to communicate with JavaScript
  80. 80. How to makeJavaScript faster?
  81. 81. How to make JavaScript faster? • Wait for ES6: StructType, const, WeakMap, yield... • High performance build-in library • WebCL • Embed another language • KL(FabricEngine), GLSL(WebGL) • Wait for Quantum computer :)
  82. 82. Things you can learn also• NaN tagging• Polymorphic Inline Cache• Type Inference• Regex JIT• Runtime optimization• ...
  83. 83. References• The behavior of efficient virtual • Context Threading: A Flexible and machine interpreters on modern Efficient Dispatch Technique for architectures Virtual Machine Interpreters• Virtual Machine Showdown: Stack • Effective Inline-Threaded Versus Registers Interpretation of Java Bytecode Using Preparation Sequences• The implementation of Lua 5.0 • Smalltalk-80: the language and its• Why Is the New Google V8 Engine implementation so Fast?
  84. 84. References• Design of the Java HotSpotTM • LLVM: A Compilation Framework Client Compiler for Java 6 for Lifelong Program Analysis & Transformation• Oracle JRockit: The Definitive Guide • Emscripten: An LLVM-to-JavaScript• Virtual Machines: Versatile Compiler platforms for systems and processes • An Analysis of the Dynamic Behavior of JavaScript Programs• Fast and Precise Hybrid Type Inference for JavaScript
  85. 85. References• Adaptive Optimization for SELF • Design, Implementation, and Evaluation of Optimizations in a• Bytecodes meet Combinators: Just-In-Time Compiler invokedynamic on the JVM • Optimizing direct threaded code by• Context Threading: A Flexible and selective inlining Efficient Dispatch Technique for Virtual Machine Interpreters • Linear scan register allocation• Efficient Implementation of the • Optimizing Invokedynamic Smalltalk-80 System • Threaded Code
  86. 86. References• Why Not a Bytecode VM? • Making the Compilation "Pipeline" Explicit- Dynamic• A Survey of Adaptive Compilation Using Trace Tree Optimization in Virtual Machines Specialization• An Efficient Implementation of • Uniprocessor Garbage Collection SELF, a Dynamically-Typed Techniques Object-Oriented Language Based on Prototypes
  87. 87. References• Representing Type Information in • The Structure and Performance of Dynamically Typed Languages Efficient Interpreters• The Behavior of Efficient Virtual • Know Your Engines: How to Make Machine Interpreters on Modern Your JavaScript Fast Architectures • IE Blog, Chromium Blog, WebKit• Trace-based Just-in-Time Type Blog, Opera Blog, Mozilla Blog, Specialization for Dynamic Wingolog’s Blog, RednaxelaFX’s Languages Blog, David Mandelin’s Blog, Brendan Eich’s Blog...
  88. 88. !ank y"
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×