• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Javascript engine performance
 

Javascript engine performance

on

  • 4,469 views

 

Statistics

Views

Total Views
4,469
Views on SlideShare
4,466
Embed Views
3

Actions

Likes
31
Downloads
181
Comments
1

2 Embeds 3

http://www.techgig.com 2
http://nodeslide.herokuapp.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • nice job :)
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Javascript engine performance Javascript engine performance Presentation Transcript

    • JavaScript Engine Performance
    • 关于我• Baidu资深工程师• 目前主要做性能优化相关的工作• 参与W3C的“HTML” 和“Web Performance” 工作组 @nwind @nwind
    • 请注意• 我不是虚拟机的专家,仅仅是业余兴趣• 很多内容都经过了简化,实际情况要复杂很多• 这里面的观点仅代表我个人看法
    • 大纲• 虚拟机的基本原理• JavaScript引擎是如何优化性能的• V8、Dart、Node.js的介绍• 如何编写高性能的JavaScript代码
    • VM basic
    • Virtual Machine history• pascal 1970• smalltalk 1980• self 1986• python 1991• java 1995• javascript 1995
    • Smalltalk的演示展现了三项惊人的成果。包括电脑之间如何实现联网,以及面向对象编程是如何工作的。但乔布斯和他的团队对这些并不感兴趣,因为他们的注意力被...
    • How Virtual Machine Work?• Parser• Intermediate Representation• Interpreter, JIT• Runtime, Garbage Collection
    • Parser• Tokenize• AST
    • Tokenize identifier numberkeyword var foo = 10; semicolon equal
    • AST AssignVariable foo Constant 10
    • Intermediate Representation• Bytecode• Stack vs. register
    • Bytecode (SpiderMonkey) 00000: deffun 0 null 00005: nop 00006: callvar 0function foo(bar) { 00009: int8 2 00011: call 1 return bar + 1; 00014: pop} 00015: stop foo:foo(2); 00020: getarg 0 00023: one 00024: add 00025: return 00026: stop
    • Bytecode (JSC) 8 m_instructions; 168 bytes at 0x7fc1ba3070e0; 1 parameter(s); 10 callee register(s) [ 0] enter [ 1] mov! ! r0, undefined(@k0) [ 4] get_global_var! r1, 5 [ 7] mov! ! r2, undefined(@k0)function foo(bar) { [ [ 10] 13] mov! ! call!! r3, 2(@k1) r1, 2, 10 return bar + 1; [ [ 17] 19] op_call_put_result! ! end! ! r0 r0} Constants: k0 = undefined k1 = 2foo(2); 3 m_instructions; 64 bytes at 0x7fc1ba306e80; 2 parameter(s); 1 callee register(s) [ 0] enter [ 1] add! ! r0, r-7, 1(@k0) [ 6] ret! ! r0 Constants: k0 = 1 End: 3
    • Stack vs. register• Stack • JVM, .NET, PHP, Python, Old JavaScript engine• Register • Lua, Dalvik, Modern JavaScript engine • Smaller, Faster (about 20%~30%) • RISC
    • Stack vs. registerlocal a,t,i 1: PUSHNIL 3a=a+i 2: GETLOCAL 0 ; a 3: GETLOCAL 2 ; i 4: ADD local a,t,i 1: LOADNIL 0 2 0 5: SETLOCAL 0 ; a a=a+i 2: ADD 0 0 2a=a+1 6: SETLOCAL 0 ; a a=a+1 3: ADD 0 0 250 ; a 7: ADDI 1 a=t[i] 4: GETTABLE 0 1 2 8: SETLOCAL 0 ; aa=t[i] 9: GETLOCAL 1 ; t 10: GETINDEXED 2 ; i 11: SETLOCAL 0 ; a
    • Interpreter• Switch statement• Direct threading, Indirect threading, Token threading ...
    • Switch statement while (true) { ! switch (opcode) { ! ! case ADD: ! ! ! ... ! ! ! break; ! ! case SUB: ! ! ! ... ! ! ! break; ... !} }
    • Direct threadingtypedef void *Inst;Inst program[] = { &&ADD, &&SUB };Inst *ip = program;goto *ip++;ADD: ... goto *ip++;SUB: ...http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
    • Threaded Code
    • http://en.wikipedia.org/wiki/File:Pipeline,_4_stage.svg
    • Context Threading Essence of our Solution… CTT - Contextiload_1 Threading Table Bytecode bodiesiload_1 (generated code) (ret terminated)iadd call iload_1 iload_1:istore_1iload_1 call iload_1 ..bipush 64 call iadd ret;if_icmplt 2 call istore_1… call iload_1 iadd: .. .. ret; Return Branch Predictor Stack Package bodies as subroutines andtechnique for virtual machine interpreters Context Threading: A flexible and efficient dispatch call them
    • Garbage Collection• Reference counting (php, python ...), Smart pointer• Tracing • Generational • Stop-the-world, Concurrent, Incremental • Copying, Sweep, Compact
    • Why JavaScript is slow?• Dynamic Type• Weak Type• Need to parse every time• GC
    • Fight with Weak Type
    • Object model in most VM typedef union { void *p; double d; long l; } Value; typedef struct { unsigned char type; Value value; } Object; Object a;
    • Tagged pointer
    • 在几乎所有系统中,指针地址会对齐 (4或8字节) http://www.gnu.org/s/libc/manual/html_node/Aligned-Memory-Blocks.html
    • 这意味着0xc00ab958 指针的最后2或3个位⼀一定是0 可以在最后⼀一位加1来表示指针 1 0 0 1 1 0 0 0 9 8 Pointer Small Number
    • Tagged pointer Memory ...var a = 1 2var b = {a:1} 0x3d2aa00 ... ... object b ...
    • Small Number2 − 1 = 1073741823 30−2 = −1073741824 30 31位能表示十亿,对大部分应用来说足够了
    • External Fixed Typed Array• Strong type, Fixed length• Out of VM heap• Example: Int32Array, Float64Array
    • Small Number + Typed Array Seconds (smaller is better) 42005000 40203750 31802500 40x1250 50 70 80 0 C/C++ Java(HotSpot) V8 PHP Ruby Python http://shootout.alioth.debian.org/u32/performance.php?test=fannkuchredux
    • Warning: Benchmark lies
    • ES6 will have struct
    • ES6 StructTypePoint2D = new StructType({ Color = new StructType({! x: uint32, ! r: uint8,! y: uint32 ! g: uint8,}); ! b: uint8 }); Pixel = new StructType({ ! point: Point2D, ! color: Color });
    • Use typed array to run faster
    • Fight with Dynamic Type
    • foo.bar
    • foo.bar in Cmovl 4(%edx), %ecx //getmovl %ecx, 4(%edx) //put
    • foo.bar in JavaScriptfound = HashTable.FindEntry(key)if (found) return found;for (pt = GetPrototype(); pt != null; pt = pt.GetPrototype()) { found = pt.HashTable.FindEntry(key) if (found) return found;}
    • How to optimize?
    • First, We need to know Object layout
    • Add Type for object add property yadd property x http://code.google.com/apis/v8/design.html
    • Inline Cache• Slow lookup at first time• Modify the JIT code in-place• Next time will directly jump to the address
    • Inline cache make simple return foo.lookupProperty(bar);function fun(foo) { return foo.bar;} if (foo[hiddenClass] == 0xfe1) { return foo[indexOf_bar]; } return foo.lookupProperty(bar);
    • 实际代码中的JS并不会那么动态Delete操作只占了0.1% “An Analysis of the Dynamic Behavior of JavaScript...”99%的原始类型可以在运行通过静态分析确定97%的属性访问可以被inline cache “TypeCastor: Demystify Dynamic Typing of JavaScript...”
    • V8 can’t handle delete yet 20x times slower! http://jsperf.com/test-v8-delete
    • Avoid alter object property layout
    • Faster Data Structure & Algorithm
    • Array push is fasterthan String concat?
    • http://jsperf.com/nwind-string-concat-vs-array-push
    • Why?
    • other string optimizations• Adaptive string search • Single char, Linear, Boyer-Moore-Horspool• Adaptive ascii and utf-8• Zero copy sub string
    • Feel free to use String in modern Engine
    • Just-In-Time (JIT)
    • JIT• Method JIT, Trace JIT, Regular expression JIT• Register allocation• Code generation
    • How JIT work?• mmap, malloc (mprotect)• generate native code• cast (c), reinterpret_cast (c++)• call the function
    • V8
    • V8• Lars Bak• Hidden Class, PICs• Some of Built-in objects are written in JavaScript• Crankshaft• Precise generation GC
    • Lars Bak• implement VM since 1988• Beta• Self• JVM (VM architect at Sun)• V8 (Google)
    • Lines of code (VM only) .cpp/.c .h500000 110831375000250000 70787 359986 63975125000 224038 80867 8043 15475 135547 120941 108280 42113 83920 44646 0 HotSpot V8 SpiderMonkey JSC Ruby CPython PHP-Zend
    • Crankshaft
    • Source code Native Coderuntime profiling High-Level IR Low-Level IR Opt Native Code } Crankshaft
    • Crankshaft• Profiling• Compiler optimization• Generate new JIT code• On-stack replacement• Deoptimize
    • High-Level IR (Hydrogen)• AST to SSA• Type inference (type feedback from inline cache)• Compiler optimization • Function inline • Loop-invariant code motion, Global value numbering • Eliminate dead phis • ...
    • Loop-invariant code motion tmp = x + y;for (i = 0; i < n; i++) { for (i = 0; i < n; i++) { a[i] = x + y; a[i] = tmp;} }
    • Function inline limit for now• big function (large than 600 bytes)• have recursive• have unsupported statements • with, switch • try/catch/finally • ...
    • Avoid “with”, “switch” and “try” in hot path
    • Built-in objects written in JS function ArraySort(comparefn) { ... // In-place QuickSort algorithm. // For short (length <= 22) arrays, insertion sort is used for efficiency. if (!IS_SPEC_FUNCTION(comparefn)) { comparefn = function (x, y) { if (x === y) return 0; if (%_IsSmi(x) && %_IsSmi(y)) { return %SmiLexicographicCompare(x, y); } x = ToString(x); y = ToString(y); if (x == y) return 0; else return x < y ? -1 : 1; }; } ... v8/src/array.js
    • GC• Precise• Stop-the-world• Generation• Incremental (2011-10)
    • V8 performance
    • V8 performance
    • V8 performance Why?
    • V8 performanceUnfair, they are using gmp library
    • Warning: Benchmark lies
    • Node.JS• Pros • Cons • Easy to write Async I/O • Lack of great libraries • One language for everything • Large JS is hard to maintain • Maybe Faster than PHP, Python • Easy to have Memory leak (compare to PHP, Erlang) • Bet on JavaScript is safe • Still too youth, unproved
    • Why Dart?• Build for large application • option type, structured, libraries, tools• Performance • lightweight process like erlang • easy to write a faster vm than javascript
    • The future of Dart?• It will not replace JS• But it may replace GWT, and become a better choice for Building large front-end application • with great IDE, mature libraries • and some way to communicate with JavaScript
    • How to makeJavaScript faster?
    • How to make JavaScript faster? • Wait for ES6: StructType, const, WeakMap, yield... • High performance build-in library • WebCL • Embed another language • KL(FabricEngine), GLSL(WebGL) • Wait for Quantum computer :)
    • Things you can learn also• NaN tagging• Polymorphic Inline Cache• Type Inference• Regex JIT• Runtime optimization• ...
    • References• The behavior of efficient virtual • Context Threading: A Flexible and machine interpreters on modern Efficient Dispatch Technique for architectures Virtual Machine Interpreters• Virtual Machine Showdown: Stack • Effective Inline-Threaded Versus Registers Interpretation of Java Bytecode Using Preparation Sequences• The implementation of Lua 5.0 • Smalltalk-80: the language and its• Why Is the New Google V8 Engine implementation so Fast?
    • References• Design of the Java HotSpotTM • LLVM: A Compilation Framework Client Compiler for Java 6 for Lifelong Program Analysis & Transformation• Oracle JRockit: The Definitive Guide • Emscripten: An LLVM-to-JavaScript• Virtual Machines: Versatile Compiler platforms for systems and processes • An Analysis of the Dynamic Behavior of JavaScript Programs• Fast and Precise Hybrid Type Inference for JavaScript
    • References• Adaptive Optimization for SELF • Design, Implementation, and Evaluation of Optimizations in a• Bytecodes meet Combinators: Just-In-Time Compiler invokedynamic on the JVM • Optimizing direct threaded code by• Context Threading: A Flexible and selective inlining Efficient Dispatch Technique for Virtual Machine Interpreters • Linear scan register allocation• Efficient Implementation of the • Optimizing Invokedynamic Smalltalk-80 System • Threaded Code
    • References• Why Not a Bytecode VM? • Making the Compilation "Pipeline" Explicit- Dynamic• A Survey of Adaptive Compilation Using Trace Tree Optimization in Virtual Machines Specialization• An Efficient Implementation of • Uniprocessor Garbage Collection SELF, a Dynamically-Typed Techniques Object-Oriented Language Based on Prototypes
    • References• Representing Type Information in • The Structure and Performance of Dynamically Typed Languages Efficient Interpreters• The Behavior of Efficient Virtual • Know Your Engines: How to Make Machine Interpreters on Modern Your JavaScript Fast Architectures • IE Blog, Chromium Blog, WebKit• Trace-based Just-in-Time Type Blog, Opera Blog, Mozilla Blog, Specialization for Dynamic Wingolog’s Blog, RednaxelaFX’s Languages Blog, David Mandelin’s Blog, Brendan Eich’s Blog...
    • !ank y"