Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

COSCUP 2013 - ThorScript: Programming Language 
for GPU Cloud and Beyond

2,175 views

Published on

Published in: Technology
  • Be the first to comment

COSCUP 2013 - ThorScript: Programming Language 
for GPU Cloud and Beyond

  1. 1. 小迪克, Founder & CEO Programming Language for GPU Cloud and Beyond
  2. 2. We have a dream...
  3. 3. There’re many World’s #1 from Taiwan
  4. 4. ...but not many in open source software...
  5. 5. Why ? THOR is designed to resolve the barrier of heterogeneous computing THOR is designed to build a new breed of applications taking advantage of latest accelerator hardware THOR aims to be the “Next Big Language” (from Taiwan)
  6. 6. “5 TFLOPS and 1TB/s to Global Memory”
  7. 7. CUDA OpenCL OpenACC Microsoft C++ AMP DirectX Compute RenderScript
  8. 8. GAP ScientistHippie CUDA OpenCL OpenACC Microsoft C++ AMP DirectX Compute RenderScript JavaScript WebRTC HTML5 /CSS3 NoSQLCoffeeScript WebSocket
  9. 9. ScientistHippie Common Language Runtime CUDA OpenCL OpenACC Microsoft C++ AMP DirectX Compute RenderScript JavaScript WebRTC HTML5 /CSS3 NoSQLCoffeeScript WebSocket EASY, FAST and POWERFUL
  10. 10. What is ? THOR is about Parallelism and Concurrency THOR is Garbage-Collected THOR is designed with love for C++ THOR implements Itanium C++ ABI to link with C++ THOR is based on LLVM and NVVM for NVIDIA target (in programmer’s terminology) THOR runs on both CPU and GPU
  11. 11. this guy
  12. 12. class MyClass { public function new() { } public function delete() { } public function getName() { return “MyClass” } } @entry function main():int32 { print(“Hello World”) } ECMA-Derived Syntax
  13. 13. class MyClass { public function new() { } public function delete() { } public function getName():String { return “MyClass” } } @entry function main():int32 { print(“Hello World”) } ECMA-Derived Syntax Strong Static Typing
  14. 14. class MyClass { public function new() { } public function delete() { } public function getName() { return “MyClass” } } @entry function main():int32 { print(“Hello World”) } ECMA-Derived Syntax ...with Type Inference
  15. 15. function hello<T>(v:T) { print(v); } class CheckedMars<T> { public function set(idx:int32, v:T):void { ... } public function get(idx:int32): T {...} private data:Array<T>; } class CheckedArray<T:int32, Builder = Sum<T> > { public function set(idx:int32, v:T):void { ... } public function get(idx:int32): T {...} public function build():T { var builder = new Builder(); builder.build(data); } private data:Array<T>; } Function/Class Template Specialization Default Template Argument
  16. 16. function dummy() { var x:int32; var y:int32; var adder = new Adder<int32>(); ... var f = lambda() : int32 { return adder.add(x, y); }; } Lambda with Auto Capture Value-capture Semantic (Objects are always in reference form)
  17. 17. @entry function test1() : int32 { // ... var fib = lambda(x : int32) : int32 { if (x < 2) return x; return fib(x-1) + fib(x-2); }; // ... return 0; } Lambda with Auto Capture Recursive Lambda without Using Fix-point Combinator
  18. 18. // adder.h template<typename T> class Adder { public: T add(T x, T y) { return x+y; } } // adder.t @native { include=”adder.h”} class Adder<T> { public function add(x:T, y:T):T; } @entry function main():int32 { var a = new Adder<int32>(); var result = a.add(123, 456); } Seamlessly Integrate with Existing C++ Code C++ Code Instantiate C++ Template Directly in ThorScript
  19. 19. Data Parallelism
  20. 20. Data-Parallelism // kernel for adding two arrays in parallel __global void add(int* a, int* b, int* c, int count) { int index = blockIdx.x * blockDim.x + threadIdx.x; c[index] = a[index] + b[index]; } int main() { // prepare array a, b, and c cudaMalloc(&a, size*sizeof(int)); ... // launch GPU kernel to add add<<<256,size/256>>>(a, b, c, size); cudaThreadSynchronize(); ... } Complicated
  21. 21. int fib(int b) { if(n<2) return n; int x = cilk_spawn fib(n-2); int y = fib(n-1); cilk_sync; return x+y; } int main() { int n = fib(10); std::cout << n; return 0; } CilkPlus
  22. 22. task fib(n:int32):int32 { if(n<2) return n; var a, b; flow -> { a = fib(n-1); b = fib(n-2); } return a+b; } task main() { int n:int32; pipeline -> { async -> n = fib(10); print(n); } } Express parallelism by flow, async, and pipeline Every statement runs in parallel tasks merge and continue here
  23. 23. task fib(n:int32):int32 { if(n<2) return n; var a, b; flow -> { a = fib(n-1); b = fib(n-2); } return a+b; } task main() { int n:int32; pipeline -> { async -> n = fib(10); print(n); } } Express parallelism by flow, async, and pipeline Create an async task
  24. 24. task fib(n:int32):int32 { if(n<2) return n; var a, b; flow -> { a = fib(n-1); b = fib(n-2); } return a+b; } task main() { int n:int32; pipeline -> { async -> n = fib(10); print(n); } } Express parallelism by flow, async, and pipeline “pipeline” converts the block into continuation-passing-style (CPS)
  25. 25. @kernel function add(a:Array<int32>, b:Array<int32>, c:Array<int32>) { var idx = getGlobalIndex(); c[idx] = a[idx] + b[idx]; } task main() { var a = [0, 1, 2, 3]; var b = [0, 1, 2, 3]; var c = <int32>[4]; async[a.size()] -> add(a, b, c); } You can still use data-parallel kernel...
  26. 26. @kernel function compute(a:Array<int32>, b:Array<int32>, c:Array<int32>) { var idx = getGlobalIndex(); if(idx == 0) { c.fill(0); a.copyFrom(b); } ... } task main() { ... async[a.size()] -> compute(a, b, c); } Hidden DMA Warp for Memory Operation the actual copy is done by the hidden DMA warp
  27. 27. var counter:int32 = 0; function update():int32 { var n; atomic -> { if(counter % 2 == 0) counter+=2; } return n; } @entry function main() { pipeline -> { async[1024] -> update(); print(counter); } Transaction Memory Block (STM/HTM) All memory access within atomic block is transactional
  28. 28. var counter:int32 = 0; function update():int32 { var n; atomic -> { ++counter; } return n; } @entry function main() { pipeline -> { async[1024] -> update(); print(counter); } Transaction Memory Block (STM/HTM) Simple transaction is converted into atomic add/ cmp_exchange instruction
  29. 29. @server function compute(n:Request):int32 { ... } @client function run_at_client() { ... pipeline -> { remote[Domain.caller()] -> var n = compute(request); print(n); } } @server // run: tsc r --server main task main() { // prepare the network manager // setup the network listener... Domain.watch(DomainEvent.Connected, lambda(d:Domain):void { remote[d] -> run_at_client(); }); } Remote Procedure Call (RPC) with Automatic Object Replication Dierent execution domain Dierent execution domain Invoke through “remote”
  30. 30. Full DWARF Support, Debug THOR in GDB
  31. 31. With GPUDirect & NVM Express GPUDirect Big Data, Real-time Analytic Application Web App, GPU- Accelerated Database Filesystem on GPU (still work-in-progress...)
  32. 32. ...is still evolving and changing everyday
  33. 33. ...and we’d like to organize a small think tank (< 5~10 people) let us know your idea, and we will implement it! plz share your thoughts on programming language and send email to sdk@zillians.com
  34. 34. 順便打點小廣告...
  35. 35. SINGULARITY(HACKERSPACE) 奇異點
  36. 36. SINGULARITY(HACKERSPACE) 奇異點 # 300坪 # 大安捷運站旁
  37. 37. Educationis dead and out-dated Computer Science
  38. 38. What Education Needs is NOT Evolution but Revolutionby Ken Robinson
  39. 39. to Aggregate Talents
  40. 40. by Creative Workshop
  41. 41. c-base hackerspace (Berlin)
  42. 42. NoiseBridge (Bay Area)
  43. 43. HackerDojo (Bay Area)
  44. 44. Artisan’s Asylum (Boston)
  45. 45. XinCheJian/新車間 (Shanghai)
  46. 46. 3D Printers Laser Cutter Linux Boards Components
  47. 47. SW/HW Hackers
  48. 48. SW/HW Hackers Designer
  49. 49. DREAMER
  50. 50. DREAMER DOER/MAKER
  51. 51. DOER/MAKER COMMUNITY
  52. 52. CHANGE
  53. 53. IMPACT
  54. 54. “where there’s hardship, there’s opportunity” Q&A

×