Amdahl’s Law
• “…the maximum speed-up through parallel
processing is set by the amount of code which
has to run serial”
10/12/2013
Build Stuff 2013
Slide 4 of 46
Challenges: Hardware
• Yield issues
• Wiring and interconnect
• Thermal density
• Power consumption
End of Moore’s law imminent…
10/12/2013
Build Stuff 2013
Slide 5 of 46
Challenges
“With nearly 10 billion devices connected to the
internet and predictions for exponential growth,
we’ve reached a point where the space, power,
and cost demands of traditional technology are
no longer sustainable.”
Meg Whitman
President and CEO, HP
10/12/2013
Build Stuff 2013
Slide 6 of 46
Heterogeneous Computing (I)
• Special purpose, highly specialised
architectures will outperform general purpose
processing devices
– Possibly by orders of magnitude
– In terms of energy efficiency as well as raw speed
– Parallel execution is key
• Non-programmable/pseudo-programmable
accelerators: ASIC, DSP, GPU, …
• Fully programmable accelerators: FPGAs
10/12/2013
Build Stuff 2013
Slide 10 of 46
Landscape of accelerator programming
Interface
CUDA
OpenCL
DirectCompute
RenderScript
Originator
NVIDIA
Khronos (Apple)
Microsoft
Google
Year
2007
2008
2009
2011
Area
HPC, desktop
Desktop, mobile,
embedded, HPC
Desktop
Mobile
OS
Windows, Linux,
Mac OS
Windows, Linux,
Mac OS (10.6+)
Windows (Vista+)
Android (3.0+)
Devices
GPUs (NVIDIA)
CPUs, GPUs,
custom
GPUs (NVIDIA,
AMD)
CPUs, GPUs,
DSPs
Work unit
Kernel
Kernel
Compute shader
Compute script
Language
CUDA C/C++
OpenCL C
HLSL
Script C
Distributed
Source, PTX
Source
Source, bytecode
LLVM bitcode
From: “The landscape of accelerator programming: a view from ARM”, Lokhmotov, A.,
3rd UK GPU Computing Conference, London
10/12/2013
Build Stuff 2013
Slide 17 of 46
Programming accelerators
• Proprietary low-level APIs, typically C-based:
– Vector intrinsics
– NVIDIA CUDA
– ATI Brook+
– ClearSpeed Cn
• No software portability, obsolescence risk.
10/12/2013
Build Stuff 2013
Slide 19 of 46
OpenCL (I)
“OpenCL (Open Computing Language) is an open,
royalty-free standard for general-purpose parallel
programming of heterogeneous systems. OpenCL
provides a uniform programming environment for
software developers to write efficient, portable code for
high-performance compute servers, desktop computer
systems and handheld devices using a diverse mix of
multi-core CPUs, GPUs, Cell-type architectures and
other parallel processors such as DSPs.”
10/12/2013
Build Stuff 2013
Slide 20 of 46
OpenCL (II)
• Allows you to write C like code which executes
on GPUs and many other devices
– CPUs, FPGAs, various other architectures
• Key point is data parallelism: applying the
same function to a large amount of data
• Allows us to leverage devices like GPUs from
Erlang easily with a minimal wrapper
10/12/2013
Build Stuff 2013
Slide 21 of 46
Epiphany-IV 64-core 28nm (E64G401)
•
•
•
•
•
•
•
•
•
•
•
•
64 High Performance RISC CPU Cores
800 MHz Operating Frequency
100 GFLOPS Peak Performance
1.6 TB/s Local Memory Bandwidth
102 GB/s Network-On-Chip Bisection Bandwidth
6.4 GB/s Off-Chip Bandwidth
2 MB On-Chip Distributed Shared Memory
2 Watt Maximum Chip Power Consumption
IEEE Floating Point Instruction Set
Fully-featured ANSI-C/C++ programmable
GNU/Eclipse based tool chain
Source synchronous LVDS off chip links for host or direct chip-tochip interfacing.
• Chip to chip links for integrating up to 64 chips on a single board
10/12/2013
Build Stuff 2013
Slide 26 of 46
OpenCL and Erlang
• Erlang is not that great for crunching image data.
– This is where OpenCL fits in.
• Erlang provides an environment around OpenCL.
Our server implementation collect frames,
offloads processing to Epiphany and send results
back.
– Low latency distributed communications and
message passing between processes and nodes
– Monitoring and supervision facilities
– “Glue” between heterogeneous nodes
10/12/2013
Build Stuff 2013
Slide 30 of 46
OpenCL on the Parallella
• Parallella is a little different than standard
GPUs
– Work sizes are different (smaller amount of cores
compared to GPU)
– Requires some forethought into structuring your
kernels
10/12/2013
Build Stuff 2013
Slide 31 of 46
Parallella and Erlang
• Ubuntu armhf packages up and running
– Will be included in the standard distro image
• Vision Demo code available now
– https://github.com/esl/parcv
10/12/2013
Build Stuff 2013
Slide 32 of 46
Erlang/ALE
• Brings embedded peripheral interfaces into
the Erlang domain
• Provides easy to use, familiar abstractions for
Erlang programmers
• Uses Raspberry Pi as reference platform, easy
to port it to other embedded platforms
• Open source (Apache version 2)
10/12/2013
Build Stuff 2013
Slide 39 of 46
Beta release
• Based on pihwm
– http://omerk.github.io/pihwm
• GPIO and GPIO interrupts, SPI, I2C and PWM
peripherals supported
• Documentation, supporting material and
educational package under development
10/12/2013
Build Stuff 2013
Slide 40 of 46
Thank you
• http://erlang-embedded.com
• embedded@erlang-solutions.com
• @ErlangEmbedded
“
The world is concurrent.
Things in the world don't share data.
Things communicate with messages.
Things fail.
- Joe Armstrong
Father of Erlang
10/12/2013
Build Stuff 2013
Slide 46 of 46