Cloud, Distributed, Embedded.
Erlang in the Heterogeneous Computing World

Omer Kilic || @OmerK
omer@erlang-solutions.com
Outline
•
•
•
•
•
•
•
•

Challenges in modern computing systems
Heterogeneous computing
Co-processors and accelerators
Pro...
Challenges: Software

• Frequency wall
• Memory bottlenecks
• Software complexity

10/12/2013

Build Stuff 2013

Slide 3 o...
Amdahl’s Law
• “…the maximum speed-up through parallel
processing is set by the amount of code which
has to run serial”

1...
Challenges: Hardware

• Yield issues
• Wiring and interconnect
• Thermal density

• Power consumption
End of Moore’s law i...
Challenges
“With nearly 10 billion devices connected to the
internet and predictions for exponential growth,
we’ve reached...
Internet of Things

10/12/2013

Build Stuff 2013

Slide 7 of 46
Device Architectures (I)

10/12/2013

Build Stuff 2013

Slide 8 of 46
Device Architectures (II)

10/12/2013

Build Stuff 2013

Slide 9 of 46
Heterogeneous Computing (I)
• Special purpose, highly specialised
architectures will outperform general purpose
processing...
Open Compute Project

10/12/2013

Build Stuff 2013

Slide 11 of 46
Heterogeneous Computing (II)

10/12/2013

Build Stuff 2013

Slide 12 of 46
GPUs

10/12/2013

Build Stuff 2013

Slide 13 of 46
Anatomy of a GPU

10/12/2013

Build Stuff 2013

Slide 14 of 46
Co-processors: NetFPGA 10G

10/12/2013

Build Stuff 2013

Slide 15 of 46
Co-processors: Generic COTS devices

10/12/2013

Build Stuff 2013

Slide 16 of 46
Landscape of accelerator programming
Interface

CUDA

OpenCL

DirectCompute

RenderScript

Originator

NVIDIA

Khronos (Ap...
Accelerator types
• Programmable accelerators
– CPU Vector extensions: x86/SSE/AVX,
PowerPC/VMX, ARM/NEON
– GPUs supportin...
Programming accelerators
• Proprietary low-level APIs, typically C-based:
– Vector intrinsics
– NVIDIA CUDA
– ATI Brook+
–...
OpenCL (I)
“OpenCL (Open Computing Language) is an open,
royalty-free standard for general-purpose parallel
programming of...
OpenCL (II)
• Allows you to write C like code which executes
on GPUs and many other devices
– CPUs, FPGAs, various other a...
The Parallella Board

10/12/2013

Build Stuff 2013

Slide 22 of 46
Shiny prototype!

10/12/2013

Build Stuff 2013

Slide 23 of 46
The Parallella Board

10/12/2013

Build Stuff 2013

Slide 24 of 46
Epiphany Architecture

10/12/2013

Build Stuff 2013

Slide 25 of 46
Epiphany-IV 64-core 28nm (E64G401)
•
•
•
•
•
•
•
•
•
•
•
•

64 High Performance RISC CPU Cores
800 MHz Operating Frequency...
Parallella Vision Demo - Overview

10/12/2013

Build Stuff 2013

Slide 27 of 46
Parallella Vision Demo - Cameras

10/12/2013

Build Stuff 2013

Slide 28 of 46
Parallella Vision Demo - Architecture

10/12/2013

Build Stuff 2013

Slide 29 of 46
OpenCL and Erlang
• Erlang is not that great for crunching image data.
– This is where OpenCL fits in.

• Erlang provides ...
OpenCL on the Parallella
• Parallella is a little different than standard
GPUs
– Work sizes are different (smaller amount ...
Parallella and Erlang
• Ubuntu armhf packages up and running
– Will be included in the standard distro image

• Vision Dem...
Embedded Landscape

10/12/2013

Build Stuff 2013

Slide 34 of 46
#include <stats.h>

Source: http://embedded.com/electronics-blogs/programming-pointers/4372180/Unexpected-trends

10/12/20...
External Interfaces in Erlang

10/12/2013

Build Stuff 2013

Slide 36 of 46
Accessing hardware
• Peripherals are memory mapped
• Access via /dev/mem…
– Faster, needs root, potentially dangerous!

• ...
Introducing…

Erlang/ALE
Actor

Library for
Embedded
http://github.com/esl/erlang-ale

10/12/2013

Build Stuff 2013

Slide...
Erlang/ALE
• Brings embedded peripheral interfaces into
the Erlang domain
• Provides easy to use, familiar abstractions fo...
Beta release
• Based on pihwm
– http://omerk.github.io/pihwm

• GPIO and GPIO interrupts, SPI, I2C and PWM
peripherals sup...
ALE Example: Blink!
{ok, _} = gpio:start_link(?LED_PIN, output),
blink() ->
gpio:write(?LED_PIN, 1),
timer:sleep(1000),
gp...
ALE Example: Interrupts
{ok, _} = gpio:start_link(?IN_PIN, input),
ok = gpio:set_int(?IN_PIN, rising),
handle_info({gpio_i...
Hardware Projects – Demo Board

10/12/2013

Build Stuff 2013

Slide 43 of 46
Packages for Embedded Architectures

https://www.erlang-solutions.com/downloads/download-erlang-otp

10/12/2013

Build Stu...
Erlang

10/12/2013

Build Stuff 2013

Slide 45 of 46
Thank you
• http://erlang-embedded.com
• embedded@erlang-solutions.com
• @ErlangEmbedded

“

The world is concurrent.
Thin...
Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing World
Upcoming SlideShare
Loading in...5
×

Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing World

1,252

Published on

Published in: Technology, News & Politics
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,252
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
17
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing World

  1. 1. Cloud, Distributed, Embedded. Erlang in the Heterogeneous Computing World Omer Kilic || @OmerK omer@erlang-solutions.com
  2. 2. Outline • • • • • • • • Challenges in modern computing systems Heterogeneous computing Co-processors and accelerators Programming models and tools Alternate architectures Parallella Vision System Erlang Embedded Project Q&A 10/12/2013 Build Stuff 2013 Slide 2 of 46
  3. 3. Challenges: Software • Frequency wall • Memory bottlenecks • Software complexity 10/12/2013 Build Stuff 2013 Slide 3 of 46
  4. 4. Amdahl’s Law • “…the maximum speed-up through parallel processing is set by the amount of code which has to run serial” 10/12/2013 Build Stuff 2013 Slide 4 of 46
  5. 5. Challenges: Hardware • Yield issues • Wiring and interconnect • Thermal density • Power consumption End of Moore’s law imminent… 10/12/2013 Build Stuff 2013 Slide 5 of 46
  6. 6. Challenges “With nearly 10 billion devices connected to the internet and predictions for exponential growth, we’ve reached a point where the space, power, and cost demands of traditional technology are no longer sustainable.” Meg Whitman President and CEO, HP 10/12/2013 Build Stuff 2013 Slide 6 of 46
  7. 7. Internet of Things 10/12/2013 Build Stuff 2013 Slide 7 of 46
  8. 8. Device Architectures (I) 10/12/2013 Build Stuff 2013 Slide 8 of 46
  9. 9. Device Architectures (II) 10/12/2013 Build Stuff 2013 Slide 9 of 46
  10. 10. Heterogeneous Computing (I) • Special purpose, highly specialised architectures will outperform general purpose processing devices – Possibly by orders of magnitude – In terms of energy efficiency as well as raw speed – Parallel execution is key • Non-programmable/pseudo-programmable accelerators: ASIC, DSP, GPU, … • Fully programmable accelerators: FPGAs 10/12/2013 Build Stuff 2013 Slide 10 of 46
  11. 11. Open Compute Project 10/12/2013 Build Stuff 2013 Slide 11 of 46
  12. 12. Heterogeneous Computing (II) 10/12/2013 Build Stuff 2013 Slide 12 of 46
  13. 13. GPUs 10/12/2013 Build Stuff 2013 Slide 13 of 46
  14. 14. Anatomy of a GPU 10/12/2013 Build Stuff 2013 Slide 14 of 46
  15. 15. Co-processors: NetFPGA 10G 10/12/2013 Build Stuff 2013 Slide 15 of 46
  16. 16. Co-processors: Generic COTS devices 10/12/2013 Build Stuff 2013 Slide 16 of 46
  17. 17. Landscape of accelerator programming Interface CUDA OpenCL DirectCompute RenderScript Originator NVIDIA Khronos (Apple) Microsoft Google Year 2007 2008 2009 2011 Area HPC, desktop Desktop, mobile, embedded, HPC Desktop Mobile OS Windows, Linux, Mac OS Windows, Linux, Mac OS (10.6+) Windows (Vista+) Android (3.0+) Devices GPUs (NVIDIA) CPUs, GPUs, custom GPUs (NVIDIA, AMD) CPUs, GPUs, DSPs Work unit Kernel Kernel Compute shader Compute script Language CUDA C/C++ OpenCL C HLSL Script C Distributed Source, PTX Source Source, bytecode LLVM bitcode From: “The landscape of accelerator programming: a view from ARM”, Lokhmotov, A., 3rd UK GPU Computing Conference, London 10/12/2013 Build Stuff 2013 Slide 17 of 46
  18. 18. Accelerator types • Programmable accelerators – CPU Vector extensions: x86/SSE/AVX, PowerPC/VMX, ARM/NEON – GPUs supporting general-purpose computing (GPGPUs) – Sony/Toshiba/IBM Cell (Sony PlayStation 3, HPC) – ClearSpeed CSX (HPC, embedded) – Adapteva Epiphany (HPC, mobile) – Intel MIC (HPC) 10/12/2013 Build Stuff 2013 Slide 18 of 46
  19. 19. Programming accelerators • Proprietary low-level APIs, typically C-based: – Vector intrinsics – NVIDIA CUDA – ATI Brook+ – ClearSpeed Cn • No software portability, obsolescence risk. 10/12/2013 Build Stuff 2013 Slide 19 of 46
  20. 20. OpenCL (I) “OpenCL (Open Computing Language) is an open, royalty-free standard for general-purpose parallel programming of heterogeneous systems. OpenCL provides a uniform programming environment for software developers to write efficient, portable code for high-performance compute servers, desktop computer systems and handheld devices using a diverse mix of multi-core CPUs, GPUs, Cell-type architectures and other parallel processors such as DSPs.” 10/12/2013 Build Stuff 2013 Slide 20 of 46
  21. 21. OpenCL (II) • Allows you to write C like code which executes on GPUs and many other devices – CPUs, FPGAs, various other architectures • Key point is data parallelism: applying the same function to a large amount of data • Allows us to leverage devices like GPUs from Erlang easily with a minimal wrapper 10/12/2013 Build Stuff 2013 Slide 21 of 46
  22. 22. The Parallella Board 10/12/2013 Build Stuff 2013 Slide 22 of 46
  23. 23. Shiny prototype! 10/12/2013 Build Stuff 2013 Slide 23 of 46
  24. 24. The Parallella Board 10/12/2013 Build Stuff 2013 Slide 24 of 46
  25. 25. Epiphany Architecture 10/12/2013 Build Stuff 2013 Slide 25 of 46
  26. 26. Epiphany-IV 64-core 28nm (E64G401) • • • • • • • • • • • • 64 High Performance RISC CPU Cores 800 MHz Operating Frequency 100 GFLOPS Peak Performance 1.6 TB/s Local Memory Bandwidth 102 GB/s Network-On-Chip Bisection Bandwidth 6.4 GB/s Off-Chip Bandwidth 2 MB On-Chip Distributed Shared Memory 2 Watt Maximum Chip Power Consumption IEEE Floating Point Instruction Set Fully-featured ANSI-C/C++ programmable GNU/Eclipse based tool chain Source synchronous LVDS off chip links for host or direct chip-tochip interfacing. • Chip to chip links for integrating up to 64 chips on a single board 10/12/2013 Build Stuff 2013 Slide 26 of 46
  27. 27. Parallella Vision Demo - Overview 10/12/2013 Build Stuff 2013 Slide 27 of 46
  28. 28. Parallella Vision Demo - Cameras 10/12/2013 Build Stuff 2013 Slide 28 of 46
  29. 29. Parallella Vision Demo - Architecture 10/12/2013 Build Stuff 2013 Slide 29 of 46
  30. 30. OpenCL and Erlang • Erlang is not that great for crunching image data. – This is where OpenCL fits in. • Erlang provides an environment around OpenCL. Our server implementation collect frames, offloads processing to Epiphany and send results back. – Low latency distributed communications and message passing between processes and nodes – Monitoring and supervision facilities – “Glue” between heterogeneous nodes 10/12/2013 Build Stuff 2013 Slide 30 of 46
  31. 31. OpenCL on the Parallella • Parallella is a little different than standard GPUs – Work sizes are different (smaller amount of cores compared to GPU) – Requires some forethought into structuring your kernels 10/12/2013 Build Stuff 2013 Slide 31 of 46
  32. 32. Parallella and Erlang • Ubuntu armhf packages up and running – Will be included in the standard distro image • Vision Demo code available now – https://github.com/esl/parcv 10/12/2013 Build Stuff 2013 Slide 32 of 46
  33. 33. Embedded Landscape 10/12/2013 Build Stuff 2013 Slide 34 of 46
  34. 34. #include <stats.h> Source: http://embedded.com/electronics-blogs/programming-pointers/4372180/Unexpected-trends 10/12/2013 Build Stuff 2013 Slide 35 of 46
  35. 35. External Interfaces in Erlang 10/12/2013 Build Stuff 2013 Slide 36 of 46
  36. 36. Accessing hardware • Peripherals are memory mapped • Access via /dev/mem… – Faster, needs root, potentially dangerous! • …or by kernel modules/sysfs – Slower, doesn’t need root, easier, relatively safer Generally very messy… 10/12/2013 Build Stuff 2013 Slide 37 of 46
  37. 37. Introducing… Erlang/ALE Actor Library for Embedded http://github.com/esl/erlang-ale 10/12/2013 Build Stuff 2013 Slide 38 of 46
  38. 38. Erlang/ALE • Brings embedded peripheral interfaces into the Erlang domain • Provides easy to use, familiar abstractions for Erlang programmers • Uses Raspberry Pi as reference platform, easy to port it to other embedded platforms • Open source (Apache version 2) 10/12/2013 Build Stuff 2013 Slide 39 of 46
  39. 39. Beta release • Based on pihwm – http://omerk.github.io/pihwm • GPIO and GPIO interrupts, SPI, I2C and PWM peripherals supported • Documentation, supporting material and educational package under development 10/12/2013 Build Stuff 2013 Slide 40 of 46
  40. 40. ALE Example: Blink! {ok, _} = gpio:start_link(?LED_PIN, output), blink() -> gpio:write(?LED_PIN, 1), timer:sleep(1000), gpio:write(?LED_PIN, 0), timer:sleep(1000). 10/12/2013 Build Stuff 2013 Slide 41 of 46
  41. 41. ALE Example: Interrupts {ok, _} = gpio:start_link(?IN_PIN, input), ok = gpio:set_int(?IN_PIN, rising), handle_info({gpio_interrupt, _Pin, _Condition}, State) -> blink(). 10/12/2013 Build Stuff 2013 Slide 42 of 46
  42. 42. Hardware Projects – Demo Board 10/12/2013 Build Stuff 2013 Slide 43 of 46
  43. 43. Packages for Embedded Architectures https://www.erlang-solutions.com/downloads/download-erlang-otp 10/12/2013 Build Stuff 2013 Slide 44 of 46
  44. 44. Erlang 10/12/2013 Build Stuff 2013 Slide 45 of 46
  45. 45. Thank you • http://erlang-embedded.com • embedded@erlang-solutions.com • @ErlangEmbedded “ The world is concurrent. Things in the world don't share data. Things communicate with messages. Things fail. - Joe Armstrong Father of Erlang 10/12/2013 Build Stuff 2013 Slide 46 of 46
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×