JavaScript on the GPU

Jarred Nicholls
Jarred NichollsPrincipal Software Architect at GOEFER | IronNet Cybersecurity
If you don’t get this ref...shame on you
Jarred Nicholls
  @jarrednicholls
jarred@webkit.org
Work @ Sencha
Web Platform Team

   Doing webkitty things...
WebKit Committer
Co-Author
W3C Web Cryptography
        API
JavaScript on the GPU
What I’ll blabber about today
Why JavaScript on the GPU
Running JavaScript on the GPU

What’s to come...
Why JavaScript on the GPU?
Why JavaScript on the GPU?

        Better question:
         Why a GPU?
Why JavaScript on the GPU?

        Better question:
         Why a GPU?

        A: They’re fast!
         (well, at certain things...)
GPUs are fast b/c...
Totally different paradigm from CPUs
Data parallelism vs. Task parallelism
Stream processing vs. Sequential processing
    GPUs can divide-and-conquer
Hardware capable of a large number of “threads”
    e.g. ATI Radeon HD 6770m:
    480 stream processing units == 480 cores
Typically very high memory bandwidth
Many, many GigaFLOPs
GPUs don’t solve all problems
Not all tasks can be accelerated by GPUs
Tasks must be parallelizable, i.e.:
    Side effect free
    Homogeneous and/or streamable
Overall tasks will become limited by Amdahl’s Law
JavaScript on the GPU
Let’s find out...
Experiment
Code Name “LateralJS”
LateralJS

Our Mission
To make JavaScript a first-class citizen on all GPUs
and take advantage of hardware accelerated
operations & data parallelization.
Our Options
          OpenCL                 Nvidia CUDA
AMD, Nvidia, Intel, etc.   Nvidia only
A shitty version of C99    C++ (C for CUDA)
No dynamic memory          Dynamic memory
No recursion               Recursion
No function pointers       Function pointers
Terrible tooling           Great dev. tooling
Immature (arguably)        More mature (arguably)
Our Options
          OpenCL                 Nvidia CUDA
AMD, Nvidia, Intel, etc.   Nvidia only
A shitty version of C99    C++ (C for CUDA)
No dynamic memory          Dynamic memory
No recursion               Recursion
No function pointers       Function pointers
Terrible tooling           Great dev. tooling
Immature (arguably)        More mature (arguably)
Why not a Static Compiler?
We want full JavaScript support
    Object / prototype
    Closures
    Recursion
    Functions as objects
    Variable typing
Type Inference limitations
Reasonably limited to size and complexity of “kernel-
esque” functions
Not nearly insane enough
JavaScript on the GPU
Why an Interpreter?
We want it all baby - full JavaScript support!
Most insane approach
Challenging to make it good, but holds a lot of promise
OpenCL Headaches
JavaScript on the GPU
Oh the agony...
Multiple memory spaces - pointer hell
No recursion - all inlined functions
No standard libc libraries
No dynamic memory
No standard data structures - apart from vector ops
Buggy ass AMD/Nvidia compilers
JavaScript on the GPU
Multiple Memory Spaces
In the order of fastest to slowest:
         space                 description
                   very fast
         private   stream processor cache (~64KB)
                   scoped to a single work item
                   fast
          local    ~= L1 cache on CPUs (~64KB)
                   scoped to a single work group
                   slow, by orders of magnitude
         global    ~= system memory over slow bus
        constant   available to all work groups/items
                   all the VRAM on the card (MBs)
Memory Space Pointer Hell
global uchar* gptr = 0x1000;
local uchar* lptr = (local uchar*) gptr; // FAIL!
uchar* pptr = (uchar*) gptr; // FAIL! private is implicit


                               0x1000




             global             local            private




            0x1000 points to something different
              depending on the address space!
Memory Space Pointer Hell
           Pointers must always be fully qualified
                Macros to help ease the pain

#define   GPTR(TYPE)   global TYPE*
#define   CPTR(TYPE)   constant TYPE*
#define   LPTR(TYPE)   local TYPE*
#define   PPTR(TYPE)   private TYPE*
No Recursion!?!?!?
  No call stack
  All functions are inlined to the kernel function


uint factorial(uint n) {
    if (n <= 1)
         return 1;
    else
         return n * factorial(n - 1); // compile-time error
}
No standard libc libraries
memcpy?
strcpy?
strcmp?
etc...
No standard libc libraries
                              Implement our own
#define MEMCPY(NAME, DEST_AS, SRC_AS) 
    DEST_AS void* NAME(DEST_AS void*, SRC_AS const void*, uint); 
    DEST_AS void* NAME(DEST_AS void* dest, SRC_AS const void* src, uint size) { 
        DEST_AS uchar* cDest = (DEST_AS uchar*)dest; 
        SRC_AS const uchar* cSrc = (SRC_AS const uchar*)src; 
        for (uint i = 0; i < size; i++) 
            cDest[i] = cSrc[i]; 
        return (DEST_AS void*)cDest; 
    }
PTR_MACRO_DEST_SRC(MEMCPY, memcpy)


                                        Produces
             memcpy_g            memcpy_gc           memcpy_lc           memcpy_pc
             memcpy_l            memcpy_gl           memcpy_lg           memcpy_pg
             memcpy_p            memcpy_gp           memcpy_lp           memcpy_pl
No dynamic memory
No malloc()
No free()
What to do...
Yes! dynamic memory
  Create a large buffer of global memory - our “heap”
  Implement our own malloc() and free()
  Create a handle structure - “virtual memory”
  P(T, hnd) macro to get the current pointer address

GPTR(handle) hnd = malloc(sizeof(uint));
GPTR(uint) ptr = P(uint, hnd);
*ptr = 0xdeadbeef;
free(hnd);
JavaScript on the GPU
Ok, we get the point...
        FYL!
High-level Architecture
       V8                 Data Heap



Esprima Parser            Stack-based
                          Interpreter

                          Host
                          Host
     Host                  GPUs
Data Serializer &
  Marshaller           Garbage Collector



  Device Mgr
High-level Architecture
                    eval(code);
       V8                               Data Heap
                    Build JSON AST

Esprima Parser                          Stack-based
                                        Interpreter

                                        Host
                                        Host
     Host                                GPUs
Data Serializer &
  Marshaller                         Garbage Collector



  Device Mgr
High-level Architecture
                    eval(code);
       V8                                  Data Heap
                    Build JSON AST

Esprima Parser                             Stack-based
                                           Interpreter
                       Serialize AST
                                           Host
                                           Host
     Host           JSON => C Structs       GPUs
Data Serializer &
  Marshaller                            Garbage Collector



  Device Mgr
High-level Architecture
                         eval(code);
       V8                                         Data Heap
                          Build JSON AST

Esprima Parser                                    Stack-based
                                                  Interpreter
                            Serialize AST
                                                  Host
                                                  Host
     Host                JSON => C Structs         GPUs
Data Serializer &
  Marshaller                                   Garbage Collector
                    Ship to GPU to Interpret

  Device Mgr
High-level Architecture
                         eval(code);
       V8                                         Data Heap
                          Build JSON AST

Esprima Parser                                    Stack-based
                                                  Interpreter
                            Serialize AST
                                                  Host
                                                  Host
     Host                JSON => C Structs         GPUs
Data Serializer &
  Marshaller                                   Garbage Collector
                    Ship to GPU to Interpret

  Device Mgr
                          Fetch Result
AST Generation
AST Generation

                                     JSON AST
JavaScript Source
                                    (v8::Object)




                                                   Lateral AST
                    Esprima in V8
                                                   (C structs)
Embed esprima.js

              Resource Generator

$ resgen esprima.js resgen_esprima_js.c
Embed esprima.js

                       resgen_esprima_js.c
const unsigned char resgen_esprima_js[]   = {
    0x2f, 0x2a, 0x0a, 0x20, 0x20, 0x43,   0x6f, 0x70, 0x79, 0x72,
    0x69, 0x67, 0x68, 0x74, 0x20, 0x28,   0x43, 0x29, 0x20, 0x32,
    ...
    0x20, 0x3a, 0x20, 0x2a, 0x2f, 0x0a,   0x0a, 0
};
Embed esprima.js
                          ASTGenerator.cpp
extern const char resgen_esprima_js;

void ASTGenerator::init()
{
    HandleScope scope;
    s_context = Context::New();
    s_context->Enter();
    Handle<Script> script = Script::Compile(String::New(&resgen_esprima_js));
    script->Run();
    s_context->Exit();
    s_initialized = true;
}
Build JSON AST

                    e.g.
ASTGenerator::esprimaParse(
    "var xyz = new Array(10);"
);
Build JSON AST
Handle<Object> ASTGenerator::esprimaParse(const char* javascript)
{
    if (!s_initialized)
        init();


    HandleScope scope;
    s_context->Enter();
    Handle<Object> global = s_context->Global();
    Handle<Object> esprima = Handle<Object>::Cast(global->Get(String::New("esprima")));
    Handle<Function> esprimaParse = Handle<Function>::Cast(esprima-
>Get(String::New("parse")));
    Handle<String> code = String::New(javascript);
    Handle<Object> ast = Handle<Object>::Cast(esprimaParse->Call(esprima, 1,
(Handle<Value>*)&code));


    s_context->Exit();
    return scope.Close(ast);
}
Build JSON AST
{
    "type": "VariableDeclaration",
    "declarations": [
        {
            "type": "VariableDeclarator",
            "id": {
                "type": "Identifier",
                "name": "xyz"
            },
            "init": {
                "type": "NewExpression",
                "callee": {
                    "type": "Identifier",
                    "name": "Array"
                },
                "arguments": [
                    {
                        "type": "Literal",
                        "value": 10
                    }
                ]
            }
        }
    ],
    "kind": "var"
}
Lateral AST structs
typedef struct ast_type_st {         #ifdef __OPENCL_VERSION__
    CL(uint) id;                     #define CL(TYPE) TYPE
    CL(uint) size;                   #else
} ast_type;                          #define CL(TYPE) cl_##TYPE
                                     #endif
typedef struct ast_program_st {
    ast_type type;
    CL(uint) body;
    CL(uint) numBody;
                                      Structs shared between
} ast_program;                           Host and OpenCL

typedef struct ast_identifier_st {
    ast_type type;
    CL(uint) name;
} ast_identifier;
Lateral AST structs

                            v8::Object => ast_type
                                  expanded
ast_type* vd1_1_init_id = (ast_type*)astCreateIdentifier("Array");
ast_type* vd1_1_init_args[1];
vd1_1_init_args[0] = (ast_type*)astCreateNumberLiteral(10);
ast_type* vd1_1_init = (ast_type*)astCreateNewExpression(vd1_1_init_id, vd1_1_init_args, 1);
free(vd1_1_init_id);
for (int i = 0; i < 1; i++)
    free(vd1_1_init_args[i]);
ast_type* vd1_1_id = (ast_type*)astCreateIdentifier("xyz");
ast_type* vd1_decls[1];
vd1_decls[0] = (ast_type*)astCreateVariableDeclarator(vd1_1_id, vd1_1_init);
free(vd1_1_id);
free(vd1_1_init);
ast_type* vd1 = (ast_type*)astCreateVariableDeclaration(vd1_decls, 1, "var");
for (int i = 0; i < 1; i++)
    free(vd1_decls[i]);
Lateral AST structs
                          astCreateIdentifier
ast_identifier* astCreateIdentifier(const char* str) {
    CL(uint) size = sizeof(ast_identifier) + rnd(strlen(str) + 1, 4);
    ast_identifier* ast_id = (ast_identifier*)malloc(size);

    // copy the string
    strcpy((char*)(ast_id + 1), str);

    // fill the struct
    ast_id->type.id = AST_IDENTIFIER;
    ast_id->type.size = size;
    ast_id->name = sizeof(ast_identifier); // offset

    return ast_id;
}
Lateral AST structs
         astCreateIdentifier(“xyz”)
offset      field              value
  0        type.id    AST_IDENTIFIER (0x01)
  4       type.size             16
  8        name             12 (offset)
 12        str[0]               ‘x’
 13        str[1]               ‘y’
 14        str[2]               ‘z’
 15        str[3]              ‘0’
Lateral AST structs
                                  astCreateNewExpression
ast_expression_new* astCreateNewExpression(ast_type* callee, ast_type** arguments, int numArgs) {
    CL(uint) size = sizeof(ast_expression_new) + callee->size;
    for (int i = 0; i < numArgs; i++)
        size += arguments[i]->size;

    ast_expression_new* ast_new = (ast_expression_new*)malloc(size);
    ast_new->type.id = AST_NEW_EXPR;
    ast_new->type.size = size;

    CL(uint) offset = sizeof(ast_expression_new);
    char* dest = (char*)ast_new;

    // copy callee
    memcpy(dest + offset, callee, callee->size);
    ast_new->callee = offset;
    offset += callee->size;

    // copy arguments
    if (numArgs) {
        ast_new->arguments = offset;
        for (int i = 0; i < numArgs; i++) {
            ast_type* arg = arguments[i];
            memcpy(dest + offset, arg, arg->size);
            offset += arg->size;
        }
    } else
        ast_new->arguments = 0;
    ast_new->numArguments = numArgs;

    return ast_new;
}
Lateral AST structs
                 new Array(10)
offset       field                 value
  0         type.id     AST_NEW_EXPR (0x308)
  4        type.size               52
  8         callee             20 (offset)
 12       arguments            40 (offset)
 16      numArguments              1
 20       callee node    ast_identifier (“Array”)
          arguments
 40                      ast_literal_number (10)
             node
Lateral AST structs
Shared across the Host and the OpenCL runtime
    Host writes, Lateral reads
Constructed on Host as contiguous blobs
    Easy to send to GPU: memcpy(gpu, ast, ast->size);
    Fast to send to GPU, single buffer write
    Simple to traverse w/ pointer arithmetic
Stack-based
 Interpreter
Building Blocks
                     JS Type Structs


AST Traverse Stack                       Lateral State


 Call/Exec Stack        Heap           Symbol/Ref Table


  Return Stack                           Scope Stack




AST Traverse Loop                      Interpret Loop
Kernels
#include "state.h"
#include "jsvm/asttraverse.h"
#include "jsvm/interpreter.h"

// Setup VM structures
kernel void lateral_init(GPTR(uchar) lateral_heap) {
    LATERAL_STATE_INIT
}

// Interpret the AST
kernel void lateral(GPTR(uchar) lateral_heap, GPTR(ast_type) lateral_ast) {
    LATERAL_STATE

    ast_push(lateral_ast);
    while (!Q_EMPTY(lateral_state->ast_stack, ast_q) || !Q_EMPTY(lateral_state->call_stack,
call_q)) {
        while (!Q_EMPTY(lateral_state->ast_stack, ast_q))
            traverse();
        if (!Q_EMPTY(lateral_state->call_stack, call_q))
            interpret();
    }
}
Let’s interpret...



 var x = 1 + 2;
var x = 1 + 2;
{
    "type": "VariableDeclaration",            AST   Call   Return
    "declarations": [
        {
            "type": "VariableDeclarator",
            "id": {
                "type": "Identifier",
                "name": "x"
            },
            "init": {
                "type": "BinaryExpression",
                "operator": "+",
                "left": {
                    "type": "Literal",
                    "value": 1
                },
                "right": {
                    "type": "Literal",
                    "value": 2
                }
            }
        }
    ],
    "kind": "var"
}
var x = 1 + 2;
{
    "type": "VariableDeclaration",             AST      Call   Return
    "declarations": [
        {
            "type": "VariableDeclarator",     VarDecl
            "id": {
                "type": "Identifier",
                "name": "x"
            },
            "init": {
                "type": "BinaryExpression",
                "operator": "+",
                "left": {
                     "type": "Literal",
                     "value": 1
                },
                "right": {
                     "type": "Literal",
                     "value": 2
                }
            }
        }
    ],
    "kind": "var"
}
var x = 1 + 2;
{
    "type": "VariableDeclaration",             AST      Call   Return
    "declarations": [
        {
            "type": "VariableDeclarator",     VarDtor
            "id": {
                "type": "Identifier",
                "name": "x"
            },
            "init": {
                "type": "BinaryExpression",
                "operator": "+",
                "left": {
                    "type": "Literal",
                    "value": 1
                },
                "right": {
                    "type": "Literal",
                    "value": 2
                }
            }
        }
    ],
    "kind": "var"
}
var x = 1 + 2;
{
    "type": "VariableDeclaration",            AST       Call     Return
    "declarations": [
        {
            "type": "VariableDeclarator",     Ident    VarDtor
            "id": {
                "type": "Identifier",          Binary
                "name": "x"
            },
            "init": {
                "type": "BinaryExpression",
                "operator": "+",
                "left": {
                    "type": "Literal",
                    "value": 1
                },
                "right": {
                    "type": "Literal",
                    "value": 2
                }
            }
        }
    ],
    "kind": "var"
}
var x = 1 + 2;
{
    "type": "VariableDeclaration",             AST       Call     Return
    "declarations": [
        {
            "type": "VariableDeclarator",      Ident    VarDtor
            "id": {
                "type": "Identifier",          Literal    Binary
            },
                "name": "x"
                                              Literal
            "init": {
                "type": "BinaryExpression",
                "operator": "+",
                "left": {
                    "type": "Literal",
                    "value": 1
                },
                "right": {
                    "type": "Literal",
                    "value": 2
                }
            }
        }
    ],
    "kind": "var"
}
var x = 1 + 2;
{
    "type": "VariableDeclaration",             AST        Call     Return
    "declarations": [
        {
            "type": "VariableDeclarator",      Ident    VarDtor
            "id": {
                "type": "Identifier",          Literal    Binary
            },
                "name": "x"
                                                         Literal
            "init": {
                "type": "BinaryExpression",
                "operator": "+",
                "left": {
                    "type": "Literal",
                    "value": 1
                },
                "right": {
                    "type": "Literal",
                    "value": 2
                }
            }
        }
    ],
    "kind": "var"
}
var x = 1 + 2;
{
    "type": "VariableDeclaration",            AST       Call     Return
    "declarations": [
        {
            "type": "VariableDeclarator",     Ident   VarDtor
            "id": {
                "type": "Identifier",                   Binary
            },
                "name": "x"
                                                       Literal
            "init": {
                "type": "BinaryExpression",
                                                       Literal
                "operator": "+",
                "left": {
                    "type": "Literal",
                    "value": 1
                },
                "right": {
                    "type": "Literal",
                    "value": 2
                }
            }
        }
    ],
    "kind": "var"
}
var x = 1 + 2;
{
    "type": "VariableDeclaration",            AST     Call     Return
    "declarations": [
        {
            "type": "VariableDeclarator",           VarDtor
            "id": {
                "type": "Identifier",                 Binary
            },
                "name": "x"
                                                     Literal
            "init": {
                "type": "BinaryExpression",
                                                     Literal
                "operator": "+",
                "left": {
                                                      Ident
                    "type": "Literal",
                    "value": 1
                },
                "right": {
                    "type": "Literal",
                    "value": 2
                }
            }
        }
    ],
    "kind": "var"
}
var x = 1 + 2;
{
    "type": "VariableDeclaration",            AST     Call     Return
    "declarations": [
        {
            "type": "VariableDeclarator",           VarDtor     “x”
            "id": {
                "type": "Identifier",                 Binary
            },
                "name": "x"
                                                     Literal
            "init": {
                "type": "BinaryExpression",
                                                     Literal
                "operator": "+",
                "left": {
                    "type": "Literal",
                    "value": 1
                },
                "right": {
                    "type": "Literal",
                    "value": 2
                }
            }
        }
    ],
    "kind": "var"
}
var x = 1 + 2;
{
    "type": "VariableDeclaration",            AST     Call     Return
    "declarations": [
        {
            "type": "VariableDeclarator",           VarDtor     “x”
            "id": {
                "type": "Identifier",                 Binary      1
            },
                "name": "x"
                                                     Literal
            "init": {
                "type": "BinaryExpression",
                "operator": "+",
                "left": {
                    "type": "Literal",
                    "value": 1
                },
                "right": {
                    "type": "Literal",
                    "value": 2
                }
            }
        }
    ],
    "kind": "var"
}
var x = 1 + 2;
{
    "type": "VariableDeclaration",            AST    Call     Return
    "declarations": [
        {
            "type": "VariableDeclarator",           VarDtor    “x”
            "id": {
                "type": "Identifier",                 Binary     1
            },
                "name": "x"
                                                                2
            "init": {
                "type": "BinaryExpression",
                "operator": "+",
                "left": {
                    "type": "Literal",
                    "value": 1
                },
                "right": {
                    "type": "Literal",
                    "value": 2
                }
            }
        }
    ],
    "kind": "var"
}
var x = 1 + 2;
{
    "type": "VariableDeclaration",            AST    Call     Return
    "declarations": [
        {
            "type": "VariableDeclarator",           VarDtor    “x”
            "id": {
                "type": "Identifier",                            3
                "name": "x"
            },
            "init": {
                "type": "BinaryExpression",
                "operator": "+",
                "left": {
                    "type": "Literal",
                    "value": 1
                },
                "right": {
                    "type": "Literal",
                    "value": 2
                }
            }
        }
    ],
    "kind": "var"
}
var x = 1 + 2;
{
    "type": "VariableDeclaration",            AST   Call   Return
    "declarations": [
        {
            "type": "VariableDeclarator",
            "id": {
                "type": "Identifier",
                "name": "x"
            },
            "init": {
                "type": "BinaryExpression",
                "operator": "+",
                "left": {
                    "type": "Literal",
                    "value": 1
                },
                "right": {
                    "type": "Literal",
                    "value": 2
                }
            }
        }
    ],
    "kind": "var"
}
Benchmark
Benchmark

                 Small loop of FLOPs
var input = new Array(10);
for (var i = 0; i < input.length; i++) {
    input[i] = Math.pow((i + 1) / 1.23, 3);
}
Execution Time
               Lateral
   GPU CL                CPU CL                      V8
 ATI Radeon 6770m   Intel Core i7 4x2.4Ghz   Intel Core i7 4x2.4Ghz




116.571533ms        0.226007ms               0.090664ms
Execution Time
               Lateral
   GPU CL                CPU CL                      V8
 ATI Radeon 6770m   Intel Core i7 4x2.4Ghz   Intel Core i7 4x2.4Ghz




116.571533ms        0.226007ms               0.090664ms
JavaScript on the GPU
What went wrong?
Everything
Stack-based AST Interpreter, no optimizations
Heavy global memory access, no optimizations
No data or task parallelism
Stack-based Interpreter
Slow as molasses
Memory hog Eclipse style
Heavy memory access
     “var x = 1 + 2;” == 30 stack hits alone!
     Too much dynamic allocation
No inline optimizations, just following the yellow brick AST
Straight up lazy

Replace with something better!
Bytecode compiler on Host
Bytecode register-based interpreter on Device
JavaScript on the GPU
Too much global access
   Everything is dynamically allocated to global memory
   Register based interpreter & bytecode compiler can
   make better use of local and private memory
// 11.1207 seconds
size_t tid = get_global_id(0);
c[tid] = a[tid];
while(b[tid] > 0) { // touch global memory on each loop
  b[tid]--; // touch global memory on each loop
  c[tid]++; // touch global memory on each loop       Optimizing memory access
}

// 0.0445558 seconds!! HOLY SHIT!
                                                      yields crazy results
size_t tid = get_global_id(0);
int tmp = a[tid]; // temp private variable
for(int i=b[tid]; i > 0; i--) tmp++; // touch private variables on each loop
c[tid] = tmp; // touch global memory one time
No data or task parallelism
  Everything being interpreted in a single “thread”
  We have hundreds of cores available to us!
  Build in heuristics
         Identify side-effect free statements
         Break into parallel tasks - very magical

                                                    input[0] = Math.pow((0 + 1) / 1.23, 3);
var input = new Array(10);
for (var i = 0; i < input.length; i++) {            input[1] = Math.pow((1 + 1) / 1.23, 3);

}
    input[i] = Math.pow((i + 1) / 1.23, 3);
                                                                        ...
                                                    input[9] = Math.pow((9 + 1) / 1.23, 3);
What’s in store
Acceptable performance on all CL devices
V8/Node extension to launch Lateral tasks
High-level API to perform map-reduce, etc.
Lateral-cluster...mmmmm
Thanks!

  Jarred Nicholls
  @jarrednicholls
jarred@webkit.org
1 of 84

Recommended

Don't Be Afraid of Abstract Syntax Trees by
Don't Be Afraid of Abstract Syntax TreesDon't Be Afraid of Abstract Syntax Trees
Don't Be Afraid of Abstract Syntax TreesJamund Ferguson
3.3K views99 slides
AST - the only true tool for building JavaScript by
AST - the only true tool for building JavaScriptAST - the only true tool for building JavaScript
AST - the only true tool for building JavaScriptIngvar Stepanyan
13.7K views40 slides
Esprima - What is that by
Esprima - What is thatEsprima - What is that
Esprima - What is thatAbhijeet Pawar
3.1K views11 slides
Your code is not a string by
Your code is not a stringYour code is not a string
Your code is not a stringIngvar Stepanyan
1.5K views51 slides
Rust ⇋ JavaScript by
Rust ⇋ JavaScriptRust ⇋ JavaScript
Rust ⇋ JavaScriptIngvar Stepanyan
10.2K views108 slides
ES6 PPT FOR 2016 by
ES6 PPT FOR 2016ES6 PPT FOR 2016
ES6 PPT FOR 2016Manoj Kumar
2.8K views129 slides

More Related Content

What's hot

FalsyValues. Dmitry Soshnikov - ECMAScript 6 by
FalsyValues. Dmitry Soshnikov - ECMAScript 6FalsyValues. Dmitry Soshnikov - ECMAScript 6
FalsyValues. Dmitry Soshnikov - ECMAScript 6Dmitry Soshnikov
30.5K views58 slides
Lightweight wrapper for Hive on Amazon EMR by
Lightweight wrapper for Hive on Amazon EMRLightweight wrapper for Hive on Amazon EMR
Lightweight wrapper for Hive on Amazon EMRShinji Tanaka
2.1K views15 slides
Introduction into ES6 JavaScript. by
Introduction into ES6 JavaScript.Introduction into ES6 JavaScript.
Introduction into ES6 JavaScript.boyney123
1.7K views43 slides
Testing Backbone applications with Jasmine by
Testing Backbone applications with JasmineTesting Backbone applications with Jasmine
Testing Backbone applications with JasmineLeon van der Grient
1.9K views23 slides
Mastering Java ByteCode by
Mastering Java ByteCodeMastering Java ByteCode
Mastering Java ByteCodeEcommerce Solution Provider SysIQ
1.7K views50 slides
ES6 - Next Generation Javascript by
ES6 - Next Generation JavascriptES6 - Next Generation Javascript
ES6 - Next Generation JavascriptRamesh Nair
12.3K views147 slides

What's hot(20)

FalsyValues. Dmitry Soshnikov - ECMAScript 6 by Dmitry Soshnikov
FalsyValues. Dmitry Soshnikov - ECMAScript 6FalsyValues. Dmitry Soshnikov - ECMAScript 6
FalsyValues. Dmitry Soshnikov - ECMAScript 6
Dmitry Soshnikov30.5K views
Lightweight wrapper for Hive on Amazon EMR by Shinji Tanaka
Lightweight wrapper for Hive on Amazon EMRLightweight wrapper for Hive on Amazon EMR
Lightweight wrapper for Hive on Amazon EMR
Shinji Tanaka2.1K views
Introduction into ES6 JavaScript. by boyney123
Introduction into ES6 JavaScript.Introduction into ES6 JavaScript.
Introduction into ES6 JavaScript.
boyney1231.7K views
ES6 - Next Generation Javascript by Ramesh Nair
ES6 - Next Generation JavascriptES6 - Next Generation Javascript
ES6 - Next Generation Javascript
Ramesh Nair12.3K views
ES2015 (ES6) Overview by hesher
ES2015 (ES6) OverviewES2015 (ES6) Overview
ES2015 (ES6) Overview
hesher22.4K views
Explaining ES6: JavaScript History and What is to Come by Cory Forsyth
Explaining ES6: JavaScript History and What is to ComeExplaining ES6: JavaScript History and What is to Come
Explaining ES6: JavaScript History and What is to Come
Cory Forsyth1.7K views
Python Objects by Quintagroup
Python ObjectsPython Objects
Python Objects
Quintagroup2.3K views
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ... by Charles Nutter
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Charles Nutter8.5K views
Start Wrap Episode 11: A New Rope by Yung-Yu Chen
Start Wrap Episode 11: A New RopeStart Wrap Episode 11: A New Rope
Start Wrap Episode 11: A New Rope
Yung-Yu Chen334 views
Google guava overview by Steve Min
Google guava overviewGoogle guava overview
Google guava overview
Steve Min6.9K views
An Intro To ES6 by FITC
An Intro To ES6An Intro To ES6
An Intro To ES6
FITC1.6K views
Down to Stack Traces, up from Heap Dumps by Andrei Pangin
Down to Stack Traces, up from Heap DumpsDown to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap Dumps
Andrei Pangin1.7K views
JavaScript - new features in ECMAScript 6 by Solution4Future
JavaScript - new features in ECMAScript 6JavaScript - new features in ECMAScript 6
JavaScript - new features in ECMAScript 6
Solution4Future21.8K views
Mastering Java Bytecode With ASM - 33rd degree, 2012 by Anton Arhipov
Mastering Java Bytecode With ASM - 33rd degree, 2012Mastering Java Bytecode With ASM - 33rd degree, 2012
Mastering Java Bytecode With ASM - 33rd degree, 2012
Anton Arhipov8K views

Viewers also liked

Hardware Acceleration on Mobile, Ariya Hidayat & Jarred Nicholls by
Hardware Acceleration on Mobile, Ariya Hidayat & Jarred NichollsHardware Acceleration on Mobile, Ariya Hidayat & Jarred Nicholls
Hardware Acceleration on Mobile, Ariya Hidayat & Jarred NichollsSencha
2.5K views64 slides
レインボーテーブルを使ったハッシュの復号とSalt by
レインボーテーブルを使ったハッシュの復号とSaltレインボーテーブルを使ったハッシュの復号とSalt
レインボーテーブルを使ったハッシュの復号とSaltRyo Maruyama
11.5K views23 slides
(COSCUP 2015) A Beginner's Journey to Mozilla SpiderMonkey JS Engine by
(COSCUP 2015) A Beginner's Journey to Mozilla SpiderMonkey JS Engine(COSCUP 2015) A Beginner's Journey to Mozilla SpiderMonkey JS Engine
(COSCUP 2015) A Beginner's Journey to Mozilla SpiderMonkey JS EngineZongXian Shen
11.8K views86 slides
Node.js vs Play Framework (with Japanese subtitles) by
Node.js vs Play Framework (with Japanese subtitles)Node.js vs Play Framework (with Japanese subtitles)
Node.js vs Play Framework (with Japanese subtitles)Yevgeniy Brikman
42.5K views163 slides
Graphics Processing Unit - GPU by
Graphics Processing Unit - GPUGraphics Processing Unit - GPU
Graphics Processing Unit - GPUChetan Gole
11.1K views21 slides
Iocp 기본 구조 이해 by
Iocp 기본 구조 이해Iocp 기본 구조 이해
Iocp 기본 구조 이해Nam Hyeonuk
25K views22 slides

Viewers also liked(20)

Hardware Acceleration on Mobile, Ariya Hidayat & Jarred Nicholls by Sencha
Hardware Acceleration on Mobile, Ariya Hidayat & Jarred NichollsHardware Acceleration on Mobile, Ariya Hidayat & Jarred Nicholls
Hardware Acceleration on Mobile, Ariya Hidayat & Jarred Nicholls
Sencha2.5K views
レインボーテーブルを使ったハッシュの復号とSalt by Ryo Maruyama
レインボーテーブルを使ったハッシュの復号とSaltレインボーテーブルを使ったハッシュの復号とSalt
レインボーテーブルを使ったハッシュの復号とSalt
Ryo Maruyama11.5K views
(COSCUP 2015) A Beginner's Journey to Mozilla SpiderMonkey JS Engine by ZongXian Shen
(COSCUP 2015) A Beginner's Journey to Mozilla SpiderMonkey JS Engine(COSCUP 2015) A Beginner's Journey to Mozilla SpiderMonkey JS Engine
(COSCUP 2015) A Beginner's Journey to Mozilla SpiderMonkey JS Engine
ZongXian Shen11.8K views
Node.js vs Play Framework (with Japanese subtitles) by Yevgeniy Brikman
Node.js vs Play Framework (with Japanese subtitles)Node.js vs Play Framework (with Japanese subtitles)
Node.js vs Play Framework (with Japanese subtitles)
Yevgeniy Brikman42.5K views
Graphics Processing Unit - GPU by Chetan Gole
Graphics Processing Unit - GPUGraphics Processing Unit - GPU
Graphics Processing Unit - GPU
Chetan Gole11.1K views
Iocp 기본 구조 이해 by Nam Hyeonuk
Iocp 기본 구조 이해Iocp 기본 구조 이해
Iocp 기본 구조 이해
Nam Hyeonuk25K views
Graphics processing unit ppt by Sandeep Singh
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit ppt
Sandeep Singh27.5K views
게임서버프로그래밍 #1 - IOCP by Seungmo Koo
게임서버프로그래밍 #1 - IOCP게임서버프로그래밍 #1 - IOCP
게임서버프로그래밍 #1 - IOCP
Seungmo Koo11.4K views
Chainerで学ぶdeep learning by Retrieva inc.
Chainerで学ぶdeep learningChainerで学ぶdeep learning
Chainerで学ぶdeep learning
Retrieva inc.5.5K views
헤테로지니어스 컴퓨팅 : CPU 에서 GPU 로 옮겨가기 by zupet
헤테로지니어스 컴퓨팅 :  CPU 에서 GPU 로 옮겨가기헤테로지니어스 컴퓨팅 :  CPU 에서 GPU 로 옮겨가기
헤테로지니어스 컴퓨팅 : CPU 에서 GPU 로 옮겨가기
zupet22.2K views
使用Javascript及HTML5打造協同運作系統 by Hsu Ping Feng
使用Javascript及HTML5打造協同運作系統使用Javascript及HTML5打造協同運作系統
使用Javascript及HTML5打造協同運作系統
Hsu Ping Feng4.6K views
入門Gulp - 前端自動化開發工具 by Anna Su
入門Gulp - 前端自動化開發工具入門Gulp - 前端自動化開發工具
入門Gulp - 前端自動化開發工具
Anna Su1.4K views
webpack 入門 by Anna Su
webpack 入門webpack 入門
webpack 入門
Anna Su4.8K views
前端界流傳的神奇招式 by Anna Su
前端界流傳的神奇招式前端界流傳的神奇招式
前端界流傳的神奇招式
Anna Su3.6K views
Railway Oriented Programming by Scott Wlaschin
Railway Oriented ProgrammingRailway Oriented Programming
Railway Oriented Programming
Scott Wlaschin638.7K views
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation by Yusuke HIDESHIMA
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
Yusuke HIDESHIMA55.7K views
「速」を落とさないコードレビュー by Takafumi ONAKA
「速」を落とさないコードレビュー「速」を落とさないコードレビュー
「速」を落とさないコードレビュー
Takafumi ONAKA55.5K views
FMK2015: The Power of JavaScript by Marcel Moré by Verein FM Konferenz
FMK2015: The Power of JavaScript by Marcel MoréFMK2015: The Power of JavaScript by Marcel Moré
FMK2015: The Power of JavaScript by Marcel Moré
Verein FM Konferenz2.9K views
FMK2015: Entwicklung von modernen Benutzeroberflächen mit FileMaker Pro by Ad... by Verein FM Konferenz
FMK2015: Entwicklung von modernen Benutzeroberflächen mit FileMaker Pro by Ad...FMK2015: Entwicklung von modernen Benutzeroberflächen mit FileMaker Pro by Ad...
FMK2015: Entwicklung von modernen Benutzeroberflächen mit FileMaker Pro by Ad...
Verein FM Konferenz2.2K views
What Makes Great Infographics by SlideShare
What Makes Great InfographicsWhat Makes Great Infographics
What Makes Great Infographics
SlideShare1.1M views

Similar to JavaScript on the GPU

Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE by
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEGenerating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEDataWorks Summit/Hadoop Summit
1.9K views43 slides
Xdp and ebpf_maps by
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_mapslcplcp1
1.7K views36 slides
Using GPUs to handle Big Data with Java by Adam Roberts. by
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.J On The Beach
876 views43 slides
Build Large-Scale Data Analytics and AI Pipeline Using RayDP by
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPDatabricks
271 views24 slides
10 things i wish i'd known before using spark in production by
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in productionParis Data Engineers !
374 views42 slides
Evolution of Spark APIs by
Evolution of Spark APIsEvolution of Spark APIs
Evolution of Spark APIsMáté Szalay-Bekő
143 views32 slides

Similar to JavaScript on the GPU(20)

Xdp and ebpf_maps by lcplcp1
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_maps
lcplcp11.7K views
Using GPUs to handle Big Data with Java by Adam Roberts. by J On The Beach
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
J On The Beach876 views
Build Large-Scale Data Analytics and AI Pipeline Using RayDP by Databricks
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Databricks271 views
NVIDIA HPC ソフトウエア斜め読み by NVIDIA Japan
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
NVIDIA Japan730 views
Porting a Streaming Pipeline from Scala to Rust by Evan Chan
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
Evan Chan4 views
Building a SIMD Supported Vectorized Native Engine for Spark SQL by Databricks
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Databricks723 views
12 Monkeys Inside JS Engine by ChengHui Weng
12 Monkeys Inside JS Engine12 Monkeys Inside JS Engine
12 Monkeys Inside JS Engine
ChengHui Weng1.1K views
Nibin - Reverse Engineering for exploit writers - ClubHack2008 by ClubHack
Nibin - Reverse Engineering for exploit writers - ClubHack2008Nibin - Reverse Engineering for exploit writers - ClubHack2008
Nibin - Reverse Engineering for exploit writers - ClubHack2008
ClubHack744 views
Reverse Engineering for exploit writers by amiable_indian
Reverse Engineering for exploit writersReverse Engineering for exploit writers
Reverse Engineering for exploit writers
amiable_indian2.3K views
Beyond Breakpoints: A Tour of Dynamic Analysis by Fastly
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic Analysis
Fastly347 views
20170602_OSSummit_an_intelligent_storage by Kohei KaiGai
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
Kohei KaiGai3.8K views
import rdma: zero-copy networking with RDMA and Python by groveronline
import rdma: zero-copy networking with RDMA and Pythonimport rdma: zero-copy networking with RDMA and Python
import rdma: zero-copy networking with RDMA and Python
groveronline2.6K views
DotNetFest - Let’s refresh our memory! Memory management in .NET by Maarten Balliauw
DotNetFest - Let’s refresh our memory! Memory management in .NETDotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NET
Maarten Balliauw480 views
ContainerDays Boston 2015: "CoreOS: Building the Layers of the Scalable Clust... by DynamicInfraDays
ContainerDays Boston 2015: "CoreOS: Building the Layers of the Scalable Clust...ContainerDays Boston 2015: "CoreOS: Building the Layers of the Scalable Clust...
ContainerDays Boston 2015: "CoreOS: Building the Layers of the Scalable Clust...
DynamicInfraDays803 views

Recently uploaded

MemVerge: Gismo (Global IO-free Shared Memory Objects) by
MemVerge: Gismo (Global IO-free Shared Memory Objects)MemVerge: Gismo (Global IO-free Shared Memory Objects)
MemVerge: Gismo (Global IO-free Shared Memory Objects)CXL Forum
112 views16 slides
CXL at OCP by
CXL at OCPCXL at OCP
CXL at OCPCXL Forum
208 views66 slides
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ... by
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ..."Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...Fwdays
33 views39 slides
Throughput by
ThroughputThroughput
ThroughputMoisés Armani Ramírez
32 views11 slides
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen... by
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...NUS-ISS
23 views70 slides
GigaIO: The March of Composability Onward to Memory with CXL by
GigaIO: The March of Composability Onward to Memory with CXLGigaIO: The March of Composability Onward to Memory with CXL
GigaIO: The March of Composability Onward to Memory with CXLCXL Forum
126 views12 slides

Recently uploaded(20)

MemVerge: Gismo (Global IO-free Shared Memory Objects) by CXL Forum
MemVerge: Gismo (Global IO-free Shared Memory Objects)MemVerge: Gismo (Global IO-free Shared Memory Objects)
MemVerge: Gismo (Global IO-free Shared Memory Objects)
CXL Forum112 views
CXL at OCP by CXL Forum
CXL at OCPCXL at OCP
CXL at OCP
CXL Forum208 views
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ... by Fwdays
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ..."Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...
Fwdays33 views
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen... by NUS-ISS
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
NUS-ISS23 views
GigaIO: The March of Composability Onward to Memory with CXL by CXL Forum
GigaIO: The March of Composability Onward to Memory with CXLGigaIO: The March of Composability Onward to Memory with CXL
GigaIO: The March of Composability Onward to Memory with CXL
CXL Forum126 views
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy by Fwdays
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
Fwdays40 views
Astera Labs: Intelligent Connectivity for Cloud and AI Infrastructure by CXL Forum
Astera Labs:  Intelligent Connectivity for Cloud and AI InfrastructureAstera Labs:  Intelligent Connectivity for Cloud and AI Infrastructure
Astera Labs: Intelligent Connectivity for Cloud and AI Infrastructure
CXL Forum125 views
[2023] Putting the R! in R&D.pdf by Eleanor McHugh
[2023] Putting the R! in R&D.pdf[2023] Putting the R! in R&D.pdf
[2023] Putting the R! in R&D.pdf
Eleanor McHugh38 views
Future of Learning - Yap Aye Wee.pdf by NUS-ISS
Future of Learning - Yap Aye Wee.pdfFuture of Learning - Yap Aye Wee.pdf
Future of Learning - Yap Aye Wee.pdf
NUS-ISS38 views
JCon Live 2023 - Lice coding some integration problems by Bernd Ruecker
JCon Live 2023 - Lice coding some integration problemsJCon Live 2023 - Lice coding some integration problems
JCon Live 2023 - Lice coding some integration problems
Bernd Ruecker67 views
Webinar : Competing for tomorrow’s leaders – How MENA insurers can win the wa... by The Digital Insurer
Webinar : Competing for tomorrow’s leaders – How MENA insurers can win the wa...Webinar : Competing for tomorrow’s leaders – How MENA insurers can win the wa...
Webinar : Competing for tomorrow’s leaders – How MENA insurers can win the wa...
"Fast Start to Building on AWS", Igor Ivaniuk by Fwdays
"Fast Start to Building on AWS", Igor Ivaniuk"Fast Start to Building on AWS", Igor Ivaniuk
"Fast Start to Building on AWS", Igor Ivaniuk
Fwdays36 views
MemVerge: Past Present and Future of CXL by CXL Forum
MemVerge: Past Present and Future of CXLMemVerge: Past Present and Future of CXL
MemVerge: Past Present and Future of CXL
CXL Forum110 views
Five Things You SHOULD Know About Postman by Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman25 views
Combining Orchestration and Choreography for a Clean Architecture by ThomasHeinrichs1
Combining Orchestration and Choreography for a Clean ArchitectureCombining Orchestration and Choreography for a Clean Architecture
Combining Orchestration and Choreography for a Clean Architecture
ThomasHeinrichs168 views
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV by Splunk
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
Splunk86 views
The details of description: Techniques, tips, and tangents on alternative tex... by BookNet Canada
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...
BookNet Canada110 views

JavaScript on the GPU

  • 1. If you don’t get this ref...shame on you
  • 2. Jarred Nicholls @jarrednicholls jarred@webkit.org
  • 3. Work @ Sencha Web Platform Team Doing webkitty things...
  • 7. What I’ll blabber about today Why JavaScript on the GPU Running JavaScript on the GPU What’s to come...
  • 8. Why JavaScript on the GPU?
  • 9. Why JavaScript on the GPU? Better question: Why a GPU?
  • 10. Why JavaScript on the GPU? Better question: Why a GPU? A: They’re fast! (well, at certain things...)
  • 11. GPUs are fast b/c... Totally different paradigm from CPUs Data parallelism vs. Task parallelism Stream processing vs. Sequential processing GPUs can divide-and-conquer Hardware capable of a large number of “threads” e.g. ATI Radeon HD 6770m: 480 stream processing units == 480 cores Typically very high memory bandwidth Many, many GigaFLOPs
  • 12. GPUs don’t solve all problems Not all tasks can be accelerated by GPUs Tasks must be parallelizable, i.e.: Side effect free Homogeneous and/or streamable Overall tasks will become limited by Amdahl’s Law
  • 16. LateralJS Our Mission To make JavaScript a first-class citizen on all GPUs and take advantage of hardware accelerated operations & data parallelization.
  • 17. Our Options OpenCL Nvidia CUDA AMD, Nvidia, Intel, etc. Nvidia only A shitty version of C99 C++ (C for CUDA) No dynamic memory Dynamic memory No recursion Recursion No function pointers Function pointers Terrible tooling Great dev. tooling Immature (arguably) More mature (arguably)
  • 18. Our Options OpenCL Nvidia CUDA AMD, Nvidia, Intel, etc. Nvidia only A shitty version of C99 C++ (C for CUDA) No dynamic memory Dynamic memory No recursion Recursion No function pointers Function pointers Terrible tooling Great dev. tooling Immature (arguably) More mature (arguably)
  • 19. Why not a Static Compiler? We want full JavaScript support Object / prototype Closures Recursion Functions as objects Variable typing Type Inference limitations Reasonably limited to size and complexity of “kernel- esque” functions Not nearly insane enough
  • 21. Why an Interpreter? We want it all baby - full JavaScript support! Most insane approach Challenging to make it good, but holds a lot of promise
  • 24. Oh the agony... Multiple memory spaces - pointer hell No recursion - all inlined functions No standard libc libraries No dynamic memory No standard data structures - apart from vector ops Buggy ass AMD/Nvidia compilers
  • 26. Multiple Memory Spaces In the order of fastest to slowest: space description very fast private stream processor cache (~64KB) scoped to a single work item fast local ~= L1 cache on CPUs (~64KB) scoped to a single work group slow, by orders of magnitude global ~= system memory over slow bus constant available to all work groups/items all the VRAM on the card (MBs)
  • 27. Memory Space Pointer Hell global uchar* gptr = 0x1000; local uchar* lptr = (local uchar*) gptr; // FAIL! uchar* pptr = (uchar*) gptr; // FAIL! private is implicit 0x1000 global local private 0x1000 points to something different depending on the address space!
  • 28. Memory Space Pointer Hell Pointers must always be fully qualified Macros to help ease the pain #define GPTR(TYPE) global TYPE* #define CPTR(TYPE) constant TYPE* #define LPTR(TYPE) local TYPE* #define PPTR(TYPE) private TYPE*
  • 29. No Recursion!?!?!? No call stack All functions are inlined to the kernel function uint factorial(uint n) { if (n <= 1) return 1; else return n * factorial(n - 1); // compile-time error }
  • 30. No standard libc libraries memcpy? strcpy? strcmp? etc...
  • 31. No standard libc libraries Implement our own #define MEMCPY(NAME, DEST_AS, SRC_AS) DEST_AS void* NAME(DEST_AS void*, SRC_AS const void*, uint); DEST_AS void* NAME(DEST_AS void* dest, SRC_AS const void* src, uint size) { DEST_AS uchar* cDest = (DEST_AS uchar*)dest; SRC_AS const uchar* cSrc = (SRC_AS const uchar*)src; for (uint i = 0; i < size; i++) cDest[i] = cSrc[i]; return (DEST_AS void*)cDest; } PTR_MACRO_DEST_SRC(MEMCPY, memcpy) Produces memcpy_g memcpy_gc memcpy_lc memcpy_pc memcpy_l memcpy_gl memcpy_lg memcpy_pg memcpy_p memcpy_gp memcpy_lp memcpy_pl
  • 32. No dynamic memory No malloc() No free() What to do...
  • 33. Yes! dynamic memory Create a large buffer of global memory - our “heap” Implement our own malloc() and free() Create a handle structure - “virtual memory” P(T, hnd) macro to get the current pointer address GPTR(handle) hnd = malloc(sizeof(uint)); GPTR(uint) ptr = P(uint, hnd); *ptr = 0xdeadbeef; free(hnd);
  • 35. Ok, we get the point... FYL!
  • 36. High-level Architecture V8 Data Heap Esprima Parser Stack-based Interpreter Host Host Host GPUs Data Serializer & Marshaller Garbage Collector Device Mgr
  • 37. High-level Architecture eval(code); V8 Data Heap Build JSON AST Esprima Parser Stack-based Interpreter Host Host Host GPUs Data Serializer & Marshaller Garbage Collector Device Mgr
  • 38. High-level Architecture eval(code); V8 Data Heap Build JSON AST Esprima Parser Stack-based Interpreter Serialize AST Host Host Host JSON => C Structs GPUs Data Serializer & Marshaller Garbage Collector Device Mgr
  • 39. High-level Architecture eval(code); V8 Data Heap Build JSON AST Esprima Parser Stack-based Interpreter Serialize AST Host Host Host JSON => C Structs GPUs Data Serializer & Marshaller Garbage Collector Ship to GPU to Interpret Device Mgr
  • 40. High-level Architecture eval(code); V8 Data Heap Build JSON AST Esprima Parser Stack-based Interpreter Serialize AST Host Host Host JSON => C Structs GPUs Data Serializer & Marshaller Garbage Collector Ship to GPU to Interpret Device Mgr Fetch Result
  • 42. AST Generation JSON AST JavaScript Source (v8::Object) Lateral AST Esprima in V8 (C structs)
  • 43. Embed esprima.js Resource Generator $ resgen esprima.js resgen_esprima_js.c
  • 44. Embed esprima.js resgen_esprima_js.c const unsigned char resgen_esprima_js[] = { 0x2f, 0x2a, 0x0a, 0x20, 0x20, 0x43, 0x6f, 0x70, 0x79, 0x72, 0x69, 0x67, 0x68, 0x74, 0x20, 0x28, 0x43, 0x29, 0x20, 0x32, ... 0x20, 0x3a, 0x20, 0x2a, 0x2f, 0x0a, 0x0a, 0 };
  • 45. Embed esprima.js ASTGenerator.cpp extern const char resgen_esprima_js; void ASTGenerator::init() { HandleScope scope; s_context = Context::New(); s_context->Enter(); Handle<Script> script = Script::Compile(String::New(&resgen_esprima_js)); script->Run(); s_context->Exit(); s_initialized = true; }
  • 46. Build JSON AST e.g. ASTGenerator::esprimaParse( "var xyz = new Array(10);" );
  • 47. Build JSON AST Handle<Object> ASTGenerator::esprimaParse(const char* javascript) { if (!s_initialized) init(); HandleScope scope; s_context->Enter(); Handle<Object> global = s_context->Global(); Handle<Object> esprima = Handle<Object>::Cast(global->Get(String::New("esprima"))); Handle<Function> esprimaParse = Handle<Function>::Cast(esprima- >Get(String::New("parse"))); Handle<String> code = String::New(javascript); Handle<Object> ast = Handle<Object>::Cast(esprimaParse->Call(esprima, 1, (Handle<Value>*)&code)); s_context->Exit(); return scope.Close(ast); }
  • 48. Build JSON AST { "type": "VariableDeclaration", "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "xyz" }, "init": { "type": "NewExpression", "callee": { "type": "Identifier", "name": "Array" }, "arguments": [ { "type": "Literal", "value": 10 } ] } } ], "kind": "var" }
  • 49. Lateral AST structs typedef struct ast_type_st { #ifdef __OPENCL_VERSION__ CL(uint) id; #define CL(TYPE) TYPE CL(uint) size; #else } ast_type; #define CL(TYPE) cl_##TYPE #endif typedef struct ast_program_st { ast_type type; CL(uint) body; CL(uint) numBody; Structs shared between } ast_program; Host and OpenCL typedef struct ast_identifier_st { ast_type type; CL(uint) name; } ast_identifier;
  • 50. Lateral AST structs v8::Object => ast_type expanded ast_type* vd1_1_init_id = (ast_type*)astCreateIdentifier("Array"); ast_type* vd1_1_init_args[1]; vd1_1_init_args[0] = (ast_type*)astCreateNumberLiteral(10); ast_type* vd1_1_init = (ast_type*)astCreateNewExpression(vd1_1_init_id, vd1_1_init_args, 1); free(vd1_1_init_id); for (int i = 0; i < 1; i++) free(vd1_1_init_args[i]); ast_type* vd1_1_id = (ast_type*)astCreateIdentifier("xyz"); ast_type* vd1_decls[1]; vd1_decls[0] = (ast_type*)astCreateVariableDeclarator(vd1_1_id, vd1_1_init); free(vd1_1_id); free(vd1_1_init); ast_type* vd1 = (ast_type*)astCreateVariableDeclaration(vd1_decls, 1, "var"); for (int i = 0; i < 1; i++) free(vd1_decls[i]);
  • 51. Lateral AST structs astCreateIdentifier ast_identifier* astCreateIdentifier(const char* str) { CL(uint) size = sizeof(ast_identifier) + rnd(strlen(str) + 1, 4); ast_identifier* ast_id = (ast_identifier*)malloc(size); // copy the string strcpy((char*)(ast_id + 1), str); // fill the struct ast_id->type.id = AST_IDENTIFIER; ast_id->type.size = size; ast_id->name = sizeof(ast_identifier); // offset return ast_id; }
  • 52. Lateral AST structs astCreateIdentifier(“xyz”) offset field value 0 type.id AST_IDENTIFIER (0x01) 4 type.size 16 8 name 12 (offset) 12 str[0] ‘x’ 13 str[1] ‘y’ 14 str[2] ‘z’ 15 str[3] ‘0’
  • 53. Lateral AST structs astCreateNewExpression ast_expression_new* astCreateNewExpression(ast_type* callee, ast_type** arguments, int numArgs) { CL(uint) size = sizeof(ast_expression_new) + callee->size; for (int i = 0; i < numArgs; i++) size += arguments[i]->size; ast_expression_new* ast_new = (ast_expression_new*)malloc(size); ast_new->type.id = AST_NEW_EXPR; ast_new->type.size = size; CL(uint) offset = sizeof(ast_expression_new); char* dest = (char*)ast_new; // copy callee memcpy(dest + offset, callee, callee->size); ast_new->callee = offset; offset += callee->size; // copy arguments if (numArgs) { ast_new->arguments = offset; for (int i = 0; i < numArgs; i++) { ast_type* arg = arguments[i]; memcpy(dest + offset, arg, arg->size); offset += arg->size; } } else ast_new->arguments = 0; ast_new->numArguments = numArgs; return ast_new; }
  • 54. Lateral AST structs new Array(10) offset field value 0 type.id AST_NEW_EXPR (0x308) 4 type.size 52 8 callee 20 (offset) 12 arguments 40 (offset) 16 numArguments 1 20 callee node ast_identifier (“Array”) arguments 40 ast_literal_number (10) node
  • 55. Lateral AST structs Shared across the Host and the OpenCL runtime Host writes, Lateral reads Constructed on Host as contiguous blobs Easy to send to GPU: memcpy(gpu, ast, ast->size); Fast to send to GPU, single buffer write Simple to traverse w/ pointer arithmetic
  • 57. Building Blocks JS Type Structs AST Traverse Stack Lateral State Call/Exec Stack Heap Symbol/Ref Table Return Stack Scope Stack AST Traverse Loop Interpret Loop
  • 58. Kernels #include "state.h" #include "jsvm/asttraverse.h" #include "jsvm/interpreter.h" // Setup VM structures kernel void lateral_init(GPTR(uchar) lateral_heap) { LATERAL_STATE_INIT } // Interpret the AST kernel void lateral(GPTR(uchar) lateral_heap, GPTR(ast_type) lateral_ast) { LATERAL_STATE ast_push(lateral_ast); while (!Q_EMPTY(lateral_state->ast_stack, ast_q) || !Q_EMPTY(lateral_state->call_stack, call_q)) { while (!Q_EMPTY(lateral_state->ast_stack, ast_q)) traverse(); if (!Q_EMPTY(lateral_state->call_stack, call_q)) interpret(); } }
  • 60. var x = 1 + 2; { "type": "VariableDeclaration", AST Call Return "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "x" }, "init": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var" }
  • 61. var x = 1 + 2; { "type": "VariableDeclaration", AST Call Return "declarations": [ { "type": "VariableDeclarator", VarDecl "id": { "type": "Identifier", "name": "x" }, "init": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var" }
  • 62. var x = 1 + 2; { "type": "VariableDeclaration", AST Call Return "declarations": [ { "type": "VariableDeclarator", VarDtor "id": { "type": "Identifier", "name": "x" }, "init": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var" }
  • 63. var x = 1 + 2; { "type": "VariableDeclaration", AST Call Return "declarations": [ { "type": "VariableDeclarator", Ident VarDtor "id": { "type": "Identifier", Binary "name": "x" }, "init": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var" }
  • 64. var x = 1 + 2; { "type": "VariableDeclaration", AST Call Return "declarations": [ { "type": "VariableDeclarator", Ident VarDtor "id": { "type": "Identifier", Literal Binary }, "name": "x" Literal "init": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var" }
  • 65. var x = 1 + 2; { "type": "VariableDeclaration", AST Call Return "declarations": [ { "type": "VariableDeclarator", Ident VarDtor "id": { "type": "Identifier", Literal Binary }, "name": "x" Literal "init": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var" }
  • 66. var x = 1 + 2; { "type": "VariableDeclaration", AST Call Return "declarations": [ { "type": "VariableDeclarator", Ident VarDtor "id": { "type": "Identifier", Binary }, "name": "x" Literal "init": { "type": "BinaryExpression", Literal "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var" }
  • 67. var x = 1 + 2; { "type": "VariableDeclaration", AST Call Return "declarations": [ { "type": "VariableDeclarator", VarDtor "id": { "type": "Identifier", Binary }, "name": "x" Literal "init": { "type": "BinaryExpression", Literal "operator": "+", "left": { Ident "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var" }
  • 68. var x = 1 + 2; { "type": "VariableDeclaration", AST Call Return "declarations": [ { "type": "VariableDeclarator", VarDtor “x” "id": { "type": "Identifier", Binary }, "name": "x" Literal "init": { "type": "BinaryExpression", Literal "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var" }
  • 69. var x = 1 + 2; { "type": "VariableDeclaration", AST Call Return "declarations": [ { "type": "VariableDeclarator", VarDtor “x” "id": { "type": "Identifier", Binary 1 }, "name": "x" Literal "init": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var" }
  • 70. var x = 1 + 2; { "type": "VariableDeclaration", AST Call Return "declarations": [ { "type": "VariableDeclarator", VarDtor “x” "id": { "type": "Identifier", Binary 1 }, "name": "x" 2 "init": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var" }
  • 71. var x = 1 + 2; { "type": "VariableDeclaration", AST Call Return "declarations": [ { "type": "VariableDeclarator", VarDtor “x” "id": { "type": "Identifier", 3 "name": "x" }, "init": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var" }
  • 72. var x = 1 + 2; { "type": "VariableDeclaration", AST Call Return "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "x" }, "init": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var" }
  • 74. Benchmark Small loop of FLOPs var input = new Array(10); for (var i = 0; i < input.length; i++) { input[i] = Math.pow((i + 1) / 1.23, 3); }
  • 75. Execution Time Lateral GPU CL CPU CL V8 ATI Radeon 6770m Intel Core i7 4x2.4Ghz Intel Core i7 4x2.4Ghz 116.571533ms 0.226007ms 0.090664ms
  • 76. Execution Time Lateral GPU CL CPU CL V8 ATI Radeon 6770m Intel Core i7 4x2.4Ghz Intel Core i7 4x2.4Ghz 116.571533ms 0.226007ms 0.090664ms
  • 78. What went wrong? Everything Stack-based AST Interpreter, no optimizations Heavy global memory access, no optimizations No data or task parallelism
  • 79. Stack-based Interpreter Slow as molasses Memory hog Eclipse style Heavy memory access “var x = 1 + 2;” == 30 stack hits alone! Too much dynamic allocation No inline optimizations, just following the yellow brick AST Straight up lazy Replace with something better! Bytecode compiler on Host Bytecode register-based interpreter on Device
  • 81. Too much global access Everything is dynamically allocated to global memory Register based interpreter & bytecode compiler can make better use of local and private memory // 11.1207 seconds size_t tid = get_global_id(0); c[tid] = a[tid]; while(b[tid] > 0) { // touch global memory on each loop b[tid]--; // touch global memory on each loop c[tid]++; // touch global memory on each loop Optimizing memory access } // 0.0445558 seconds!! HOLY SHIT! yields crazy results size_t tid = get_global_id(0); int tmp = a[tid]; // temp private variable for(int i=b[tid]; i > 0; i--) tmp++; // touch private variables on each loop c[tid] = tmp; // touch global memory one time
  • 82. No data or task parallelism Everything being interpreted in a single “thread” We have hundreds of cores available to us! Build in heuristics Identify side-effect free statements Break into parallel tasks - very magical input[0] = Math.pow((0 + 1) / 1.23, 3); var input = new Array(10); for (var i = 0; i < input.length; i++) { input[1] = Math.pow((1 + 1) / 1.23, 3); } input[i] = Math.pow((i + 1) / 1.23, 3); ... input[9] = Math.pow((9 + 1) / 1.23, 3);
  • 83. What’s in store Acceptable performance on all CL devices V8/Node extension to launch Lateral tasks High-level API to perform map-reduce, etc. Lateral-cluster...mmmmm
  • 84. Thanks! Jarred Nicholls @jarrednicholls jarred@webkit.org