Advertisement
Advertisement

More Related Content

Similar to Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing World(20)

Advertisement
Advertisement

Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing World

  1. Cloud, Distributed, Embedded. Erlang in the Heterogeneous Computing World Omer Kilic || @OmerK omer@erlang-solutions.com
  2. Outline • • • • • • • • Challenges in modern computing systems Heterogeneous computing Co-processors and accelerators Programming models and tools Alternate architectures Parallella Vision System Erlang Embedded Project Q&A 10/12/2013 Build Stuff 2013 Slide 2 of 46
  3. Challenges: Software • Frequency wall • Memory bottlenecks • Software complexity 10/12/2013 Build Stuff 2013 Slide 3 of 46
  4. Amdahl’s Law • “…the maximum speed-up through parallel processing is set by the amount of code which has to run serial” 10/12/2013 Build Stuff 2013 Slide 4 of 46
  5. Challenges: Hardware • Yield issues • Wiring and interconnect • Thermal density • Power consumption End of Moore’s law imminent… 10/12/2013 Build Stuff 2013 Slide 5 of 46
  6. Challenges “With nearly 10 billion devices connected to the internet and predictions for exponential growth, we’ve reached a point where the space, power, and cost demands of traditional technology are no longer sustainable.” Meg Whitman President and CEO, HP 10/12/2013 Build Stuff 2013 Slide 6 of 46
  7. Internet of Things 10/12/2013 Build Stuff 2013 Slide 7 of 46
  8. Device Architectures (I) 10/12/2013 Build Stuff 2013 Slide 8 of 46
  9. Device Architectures (II) 10/12/2013 Build Stuff 2013 Slide 9 of 46
  10. Heterogeneous Computing (I) • Special purpose, highly specialised architectures will outperform general purpose processing devices – Possibly by orders of magnitude – In terms of energy efficiency as well as raw speed – Parallel execution is key • Non-programmable/pseudo-programmable accelerators: ASIC, DSP, GPU, … • Fully programmable accelerators: FPGAs 10/12/2013 Build Stuff 2013 Slide 10 of 46
  11. Open Compute Project 10/12/2013 Build Stuff 2013 Slide 11 of 46
  12. Heterogeneous Computing (II) 10/12/2013 Build Stuff 2013 Slide 12 of 46
  13. GPUs 10/12/2013 Build Stuff 2013 Slide 13 of 46
  14. Anatomy of a GPU 10/12/2013 Build Stuff 2013 Slide 14 of 46
  15. Co-processors: NetFPGA 10G 10/12/2013 Build Stuff 2013 Slide 15 of 46
  16. Co-processors: Generic COTS devices 10/12/2013 Build Stuff 2013 Slide 16 of 46
  17. Landscape of accelerator programming Interface CUDA OpenCL DirectCompute RenderScript Originator NVIDIA Khronos (Apple) Microsoft Google Year 2007 2008 2009 2011 Area HPC, desktop Desktop, mobile, embedded, HPC Desktop Mobile OS Windows, Linux, Mac OS Windows, Linux, Mac OS (10.6+) Windows (Vista+) Android (3.0+) Devices GPUs (NVIDIA) CPUs, GPUs, custom GPUs (NVIDIA, AMD) CPUs, GPUs, DSPs Work unit Kernel Kernel Compute shader Compute script Language CUDA C/C++ OpenCL C HLSL Script C Distributed Source, PTX Source Source, bytecode LLVM bitcode From: “The landscape of accelerator programming: a view from ARM”, Lokhmotov, A., 3rd UK GPU Computing Conference, London 10/12/2013 Build Stuff 2013 Slide 17 of 46
  18. Accelerator types • Programmable accelerators – CPU Vector extensions: x86/SSE/AVX, PowerPC/VMX, ARM/NEON – GPUs supporting general-purpose computing (GPGPUs) – Sony/Toshiba/IBM Cell (Sony PlayStation 3, HPC) – ClearSpeed CSX (HPC, embedded) – Adapteva Epiphany (HPC, mobile) – Intel MIC (HPC) 10/12/2013 Build Stuff 2013 Slide 18 of 46
  19. Programming accelerators • Proprietary low-level APIs, typically C-based: – Vector intrinsics – NVIDIA CUDA – ATI Brook+ – ClearSpeed Cn • No software portability, obsolescence risk. 10/12/2013 Build Stuff 2013 Slide 19 of 46
  20. OpenCL (I) “OpenCL (Open Computing Language) is an open, royalty-free standard for general-purpose parallel programming of heterogeneous systems. OpenCL provides a uniform programming environment for software developers to write efficient, portable code for high-performance compute servers, desktop computer systems and handheld devices using a diverse mix of multi-core CPUs, GPUs, Cell-type architectures and other parallel processors such as DSPs.” 10/12/2013 Build Stuff 2013 Slide 20 of 46
  21. OpenCL (II) • Allows you to write C like code which executes on GPUs and many other devices – CPUs, FPGAs, various other architectures • Key point is data parallelism: applying the same function to a large amount of data • Allows us to leverage devices like GPUs from Erlang easily with a minimal wrapper 10/12/2013 Build Stuff 2013 Slide 21 of 46
  22. The Parallella Board 10/12/2013 Build Stuff 2013 Slide 22 of 46
  23. Shiny prototype! 10/12/2013 Build Stuff 2013 Slide 23 of 46
  24. The Parallella Board 10/12/2013 Build Stuff 2013 Slide 24 of 46
  25. Epiphany Architecture 10/12/2013 Build Stuff 2013 Slide 25 of 46
  26. Epiphany-IV 64-core 28nm (E64G401) • • • • • • • • • • • • 64 High Performance RISC CPU Cores 800 MHz Operating Frequency 100 GFLOPS Peak Performance 1.6 TB/s Local Memory Bandwidth 102 GB/s Network-On-Chip Bisection Bandwidth 6.4 GB/s Off-Chip Bandwidth 2 MB On-Chip Distributed Shared Memory 2 Watt Maximum Chip Power Consumption IEEE Floating Point Instruction Set Fully-featured ANSI-C/C++ programmable GNU/Eclipse based tool chain Source synchronous LVDS off chip links for host or direct chip-tochip interfacing. • Chip to chip links for integrating up to 64 chips on a single board 10/12/2013 Build Stuff 2013 Slide 26 of 46
  27. Parallella Vision Demo - Overview 10/12/2013 Build Stuff 2013 Slide 27 of 46
  28. Parallella Vision Demo - Cameras 10/12/2013 Build Stuff 2013 Slide 28 of 46
  29. Parallella Vision Demo - Architecture 10/12/2013 Build Stuff 2013 Slide 29 of 46
  30. OpenCL and Erlang • Erlang is not that great for crunching image data. – This is where OpenCL fits in. • Erlang provides an environment around OpenCL. Our server implementation collect frames, offloads processing to Epiphany and send results back. – Low latency distributed communications and message passing between processes and nodes – Monitoring and supervision facilities – “Glue” between heterogeneous nodes 10/12/2013 Build Stuff 2013 Slide 30 of 46
  31. OpenCL on the Parallella • Parallella is a little different than standard GPUs – Work sizes are different (smaller amount of cores compared to GPU) – Requires some forethought into structuring your kernels 10/12/2013 Build Stuff 2013 Slide 31 of 46
  32. Parallella and Erlang • Ubuntu armhf packages up and running – Will be included in the standard distro image • Vision Demo code available now – https://github.com/esl/parcv 10/12/2013 Build Stuff 2013 Slide 32 of 46
  33. Embedded Landscape 10/12/2013 Build Stuff 2013 Slide 34 of 46
  34. #include <stats.h> Source: http://embedded.com/electronics-blogs/programming-pointers/4372180/Unexpected-trends 10/12/2013 Build Stuff 2013 Slide 35 of 46
  35. External Interfaces in Erlang 10/12/2013 Build Stuff 2013 Slide 36 of 46
  36. Accessing hardware • Peripherals are memory mapped • Access via /dev/mem… – Faster, needs root, potentially dangerous! • …or by kernel modules/sysfs – Slower, doesn’t need root, easier, relatively safer Generally very messy… 10/12/2013 Build Stuff 2013 Slide 37 of 46
  37. Introducing… Erlang/ALE Actor Library for Embedded http://github.com/esl/erlang-ale 10/12/2013 Build Stuff 2013 Slide 38 of 46
  38. Erlang/ALE • Brings embedded peripheral interfaces into the Erlang domain • Provides easy to use, familiar abstractions for Erlang programmers • Uses Raspberry Pi as reference platform, easy to port it to other embedded platforms • Open source (Apache version 2) 10/12/2013 Build Stuff 2013 Slide 39 of 46
  39. Beta release • Based on pihwm – http://omerk.github.io/pihwm • GPIO and GPIO interrupts, SPI, I2C and PWM peripherals supported • Documentation, supporting material and educational package under development 10/12/2013 Build Stuff 2013 Slide 40 of 46
  40. ALE Example: Blink! {ok, _} = gpio:start_link(?LED_PIN, output), blink() -> gpio:write(?LED_PIN, 1), timer:sleep(1000), gpio:write(?LED_PIN, 0), timer:sleep(1000). 10/12/2013 Build Stuff 2013 Slide 41 of 46
  41. ALE Example: Interrupts {ok, _} = gpio:start_link(?IN_PIN, input), ok = gpio:set_int(?IN_PIN, rising), handle_info({gpio_interrupt, _Pin, _Condition}, State) -> blink(). 10/12/2013 Build Stuff 2013 Slide 42 of 46
  42. Hardware Projects – Demo Board 10/12/2013 Build Stuff 2013 Slide 43 of 46
  43. Packages for Embedded Architectures https://www.erlang-solutions.com/downloads/download-erlang-otp 10/12/2013 Build Stuff 2013 Slide 44 of 46
  44. Erlang 10/12/2013 Build Stuff 2013 Slide 45 of 46
  45. Thank you • http://erlang-embedded.com • embedded@erlang-solutions.com • @ErlangEmbedded “ The world is concurrent. Things in the world don't share data. Things communicate with messages. Things fail. - Joe Armstrong Father of Erlang 10/12/2013 Build Stuff 2013 Slide 46 of 46
Advertisement