Javascript engine performance
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Javascript engine performance

on

  • 5,173 views

 

Statistics

Views

Total Views
5,173
Views on SlideShare
5,170
Embed Views
3

Actions

Likes
35
Downloads
188
Comments
1

2 Embeds 3

http://www.techgig.com 2
http://nodeslide.herokuapp.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • nice job :)
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Javascript engine performance Presentation Transcript

  • 1. JavaScript Engine Performance
  • 2. 关于我• Baidu资深工程师• 目前主要做性能优化相关的工作• 参与W3C的“HTML” 和“Web Performance” 工作组 @nwind @nwind
  • 3. 请注意• 我不是虚拟机的专家,仅仅是业余兴趣• 很多内容都经过了简化,实际情况要复杂很多• 这里面的观点仅代表我个人看法
  • 4. 大纲• 虚拟机的基本原理• JavaScript引擎是如何优化性能的• V8、Dart、Node.js的介绍• 如何编写高性能的JavaScript代码
  • 5. VM basic
  • 6. Virtual Machine history• pascal 1970• smalltalk 1980• self 1986• python 1991• java 1995• javascript 1995
  • 7. Smalltalk的演示展现了三项惊人的成果。包括电脑之间如何实现联网,以及面向对象编程是如何工作的。但乔布斯和他的团队对这些并不感兴趣,因为他们的注意力被...
  • 8. How Virtual Machine Work?• Parser• Intermediate Representation• Interpreter, JIT• Runtime, Garbage Collection
  • 9. Parser• Tokenize• AST
  • 10. Tokenize identifier numberkeyword var foo = 10; semicolon equal
  • 11. AST AssignVariable foo Constant 10
  • 12. Intermediate Representation• Bytecode• Stack vs. register
  • 13. Bytecode (SpiderMonkey) 00000: deffun 0 null 00005: nop 00006: callvar 0function foo(bar) { 00009: int8 2 00011: call 1 return bar + 1; 00014: pop} 00015: stop foo:foo(2); 00020: getarg 0 00023: one 00024: add 00025: return 00026: stop
  • 14. Bytecode (JSC) 8 m_instructions; 168 bytes at 0x7fc1ba3070e0; 1 parameter(s); 10 callee register(s) [ 0] enter [ 1] mov! ! r0, undefined(@k0) [ 4] get_global_var! r1, 5 [ 7] mov! ! r2, undefined(@k0)function foo(bar) { [ [ 10] 13] mov! ! call!! r3, 2(@k1) r1, 2, 10 return bar + 1; [ [ 17] 19] op_call_put_result! ! end! ! r0 r0} Constants: k0 = undefined k1 = 2foo(2); 3 m_instructions; 64 bytes at 0x7fc1ba306e80; 2 parameter(s); 1 callee register(s) [ 0] enter [ 1] add! ! r0, r-7, 1(@k0) [ 6] ret! ! r0 Constants: k0 = 1 End: 3
  • 15. Stack vs. register• Stack • JVM, .NET, PHP, Python, Old JavaScript engine• Register • Lua, Dalvik, Modern JavaScript engine • Smaller, Faster (about 20%~30%) • RISC
  • 16. Stack vs. registerlocal a,t,i 1: PUSHNIL 3a=a+i 2: GETLOCAL 0 ; a 3: GETLOCAL 2 ; i 4: ADD local a,t,i 1: LOADNIL 0 2 0 5: SETLOCAL 0 ; a a=a+i 2: ADD 0 0 2a=a+1 6: SETLOCAL 0 ; a a=a+1 3: ADD 0 0 250 ; a 7: ADDI 1 a=t[i] 4: GETTABLE 0 1 2 8: SETLOCAL 0 ; aa=t[i] 9: GETLOCAL 1 ; t 10: GETINDEXED 2 ; i 11: SETLOCAL 0 ; a
  • 17. Interpreter• Switch statement• Direct threading, Indirect threading, Token threading ...
  • 18. Switch statement while (true) { ! switch (opcode) { ! ! case ADD: ! ! ! ... ! ! ! break; ! ! case SUB: ! ! ! ... ! ! ! break; ... !} }
  • 19. Direct threadingtypedef void *Inst;Inst program[] = { &&ADD, &&SUB };Inst *ip = program;goto *ip++;ADD: ... goto *ip++;SUB: ...http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
  • 20. Threaded Code
  • 21. http://en.wikipedia.org/wiki/File:Pipeline,_4_stage.svg
  • 22. Context Threading Essence of our Solution… CTT - Contextiload_1 Threading Table Bytecode bodiesiload_1 (generated code) (ret terminated)iadd call iload_1 iload_1:istore_1iload_1 call iload_1 ..bipush 64 call iadd ret;if_icmplt 2 call istore_1… call iload_1 iadd: .. .. ret; Return Branch Predictor Stack Package bodies as subroutines andtechnique for virtual machine interpreters Context Threading: A flexible and efficient dispatch call them
  • 23. Garbage Collection• Reference counting (php, python ...), Smart pointer• Tracing • Generational • Stop-the-world, Concurrent, Incremental • Copying, Sweep, Compact
  • 24. Why JavaScript is slow?• Dynamic Type• Weak Type• Need to parse every time• GC
  • 25. Fight with Weak Type
  • 26. Object model in most VM typedef union { void *p; double d; long l; } Value; typedef struct { unsigned char type; Value value; } Object; Object a;
  • 27. Tagged pointer
  • 28. 在几乎所有系统中,指针地址会对齐 (4或8字节) http://www.gnu.org/s/libc/manual/html_node/Aligned-Memory-Blocks.html
  • 29. 这意味着0xc00ab958 指针的最后2或3个位⼀一定是0 可以在最后⼀一位加1来表示指针 1 0 0 1 1 0 0 0 9 8 Pointer Small Number
  • 30. Tagged pointer Memory ...var a = 1 2var b = {a:1} 0x3d2aa00 ... ... object b ...
  • 31. Small Number2 − 1 = 1073741823 30−2 = −1073741824 30 31位能表示十亿,对大部分应用来说足够了
  • 32. External Fixed Typed Array• Strong type, Fixed length• Out of VM heap• Example: Int32Array, Float64Array
  • 33. Small Number + Typed Array Seconds (smaller is better) 42005000 40203750 31802500 40x1250 50 70 80 0 C/C++ Java(HotSpot) V8 PHP Ruby Python http://shootout.alioth.debian.org/u32/performance.php?test=fannkuchredux
  • 34. Warning: Benchmark lies
  • 35. ES6 will have struct
  • 36. ES6 StructTypePoint2D = new StructType({ Color = new StructType({! x: uint32, ! r: uint8,! y: uint32 ! g: uint8,}); ! b: uint8 }); Pixel = new StructType({ ! point: Point2D, ! color: Color });
  • 37. Use typed array to run faster
  • 38. Fight with Dynamic Type
  • 39. foo.bar
  • 40. foo.bar in Cmovl 4(%edx), %ecx //getmovl %ecx, 4(%edx) //put
  • 41. foo.bar in JavaScriptfound = HashTable.FindEntry(key)if (found) return found;for (pt = GetPrototype(); pt != null; pt = pt.GetPrototype()) { found = pt.HashTable.FindEntry(key) if (found) return found;}
  • 42. How to optimize?
  • 43. First, We need to know Object layout
  • 44. Add Type for object add property yadd property x http://code.google.com/apis/v8/design.html
  • 45. Inline Cache• Slow lookup at first time• Modify the JIT code in-place• Next time will directly jump to the address
  • 46. Inline cache make simple return foo.lookupProperty(bar);function fun(foo) { return foo.bar;} if (foo[hiddenClass] == 0xfe1) { return foo[indexOf_bar]; } return foo.lookupProperty(bar);
  • 47. 实际代码中的JS并不会那么动态Delete操作只占了0.1% “An Analysis of the Dynamic Behavior of JavaScript...”99%的原始类型可以在运行通过静态分析确定97%的属性访问可以被inline cache “TypeCastor: Demystify Dynamic Typing of JavaScript...”
  • 48. V8 can’t handle delete yet 20x times slower! http://jsperf.com/test-v8-delete
  • 49. Avoid alter object property layout
  • 50. Faster Data Structure & Algorithm
  • 51. Array push is fasterthan String concat?
  • 52. http://jsperf.com/nwind-string-concat-vs-array-push
  • 53. Why?
  • 54. other string optimizations• Adaptive string search • Single char, Linear, Boyer-Moore-Horspool• Adaptive ascii and utf-8• Zero copy sub string
  • 55. Feel free to use String in modern Engine
  • 56. Just-In-Time (JIT)
  • 57. JIT• Method JIT, Trace JIT, Regular expression JIT• Register allocation• Code generation
  • 58. How JIT work?• mmap, malloc (mprotect)• generate native code• cast (c), reinterpret_cast (c++)• call the function
  • 59. V8
  • 60. V8• Lars Bak• Hidden Class, PICs• Some of Built-in objects are written in JavaScript• Crankshaft• Precise generation GC
  • 61. Lars Bak• implement VM since 1988• Beta• Self• JVM (VM architect at Sun)• V8 (Google)
  • 62. Lines of code (VM only) .cpp/.c .h500000 110831375000250000 70787 359986 63975125000 224038 80867 8043 15475 135547 120941 108280 42113 83920 44646 0 HotSpot V8 SpiderMonkey JSC Ruby CPython PHP-Zend
  • 63. Crankshaft
  • 64. Source code Native Coderuntime profiling High-Level IR Low-Level IR Opt Native Code } Crankshaft
  • 65. Crankshaft• Profiling• Compiler optimization• Generate new JIT code• On-stack replacement• Deoptimize
  • 66. High-Level IR (Hydrogen)• AST to SSA• Type inference (type feedback from inline cache)• Compiler optimization • Function inline • Loop-invariant code motion, Global value numbering • Eliminate dead phis • ...
  • 67. Loop-invariant code motion tmp = x + y;for (i = 0; i < n; i++) { for (i = 0; i < n; i++) { a[i] = x + y; a[i] = tmp;} }
  • 68. Function inline limit for now• big function (large than 600 bytes)• have recursive• have unsupported statements • with, switch • try/catch/finally • ...
  • 69. Avoid “with”, “switch” and “try” in hot path
  • 70. Built-in objects written in JS function ArraySort(comparefn) { ... // In-place QuickSort algorithm. // For short (length <= 22) arrays, insertion sort is used for efficiency. if (!IS_SPEC_FUNCTION(comparefn)) { comparefn = function (x, y) { if (x === y) return 0; if (%_IsSmi(x) && %_IsSmi(y)) { return %SmiLexicographicCompare(x, y); } x = ToString(x); y = ToString(y); if (x == y) return 0; else return x < y ? -1 : 1; }; } ... v8/src/array.js
  • 71. GC• Precise• Stop-the-world• Generation• Incremental (2011-10)
  • 72. V8 performance
  • 73. V8 performance
  • 74. V8 performance Why?
  • 75. V8 performanceUnfair, they are using gmp library
  • 76. Warning: Benchmark lies
  • 77. Node.JS• Pros • Cons • Easy to write Async I/O • Lack of great libraries • One language for everything • Large JS is hard to maintain • Maybe Faster than PHP, Python • Easy to have Memory leak (compare to PHP, Erlang) • Bet on JavaScript is safe • Still too youth, unproved
  • 78. Why Dart?• Build for large application • option type, structured, libraries, tools• Performance • lightweight process like erlang • easy to write a faster vm than javascript
  • 79. The future of Dart?• It will not replace JS• But it may replace GWT, and become a better choice for Building large front-end application • with great IDE, mature libraries • and some way to communicate with JavaScript
  • 80. How to makeJavaScript faster?
  • 81. How to make JavaScript faster? • Wait for ES6: StructType, const, WeakMap, yield... • High performance build-in library • WebCL • Embed another language • KL(FabricEngine), GLSL(WebGL) • Wait for Quantum computer :)
  • 82. Things you can learn also• NaN tagging• Polymorphic Inline Cache• Type Inference• Regex JIT• Runtime optimization• ...
  • 83. References• The behavior of efficient virtual • Context Threading: A Flexible and machine interpreters on modern Efficient Dispatch Technique for architectures Virtual Machine Interpreters• Virtual Machine Showdown: Stack • Effective Inline-Threaded Versus Registers Interpretation of Java Bytecode Using Preparation Sequences• The implementation of Lua 5.0 • Smalltalk-80: the language and its• Why Is the New Google V8 Engine implementation so Fast?
  • 84. References• Design of the Java HotSpotTM • LLVM: A Compilation Framework Client Compiler for Java 6 for Lifelong Program Analysis & Transformation• Oracle JRockit: The Definitive Guide • Emscripten: An LLVM-to-JavaScript• Virtual Machines: Versatile Compiler platforms for systems and processes • An Analysis of the Dynamic Behavior of JavaScript Programs• Fast and Precise Hybrid Type Inference for JavaScript
  • 85. References• Adaptive Optimization for SELF • Design, Implementation, and Evaluation of Optimizations in a• Bytecodes meet Combinators: Just-In-Time Compiler invokedynamic on the JVM • Optimizing direct threaded code by• Context Threading: A Flexible and selective inlining Efficient Dispatch Technique for Virtual Machine Interpreters • Linear scan register allocation• Efficient Implementation of the • Optimizing Invokedynamic Smalltalk-80 System • Threaded Code
  • 86. References• Why Not a Bytecode VM? • Making the Compilation "Pipeline" Explicit- Dynamic• A Survey of Adaptive Compilation Using Trace Tree Optimization in Virtual Machines Specialization• An Efficient Implementation of • Uniprocessor Garbage Collection SELF, a Dynamically-Typed Techniques Object-Oriented Language Based on Prototypes
  • 87. References• Representing Type Information in • The Structure and Performance of Dynamically Typed Languages Efficient Interpreters• The Behavior of Efficient Virtual • Know Your Engines: How to Make Machine Interpreters on Modern Your JavaScript Fast Architectures • IE Blog, Chromium Blog, WebKit• Trace-based Just-in-Time Type Blog, Opera Blog, Mozilla Blog, Specialization for Dynamic Wingolog’s Blog, RednaxelaFX’s Languages Blog, David Mandelin’s Blog, Brendan Eich’s Blog...
  • 88. !ank y"