Cockatrice is a hardware design environment that allows designing hardware circuits from Elixir code. It synthesizes Elixir code following the "Zen style" of using enumerations and pipelines to describe dataflow into a hardware description language representation of a dataflow circuit. The synthesis flow analyzes the Elixir code, generates hardware modules from functions, connects them as a dataflow circuit, and outputs the final circuit description along with an interface driver for communication between the generated hardware and a Elixir software application. This allows accelerating parts of Elixir code by offloading processing to customized hardware circuits designed from the Elixir code.
2. Who am I?
@takasehideki
− Assistant Professor at Kyoto University
− Researcher at PRESTO program,
Japan Science of Technology Agency
My Research Topics/Interests
− System level design for
embedded real-time systems
− IoT computing architecture
SW/HW codesign for
processors and FPGA!
− and,,, Elixir for IoT!!!
2
3. Thank to,,, with Wabi-Sabi
• My students in lab.
− Kentaro Matsui
− Yasuhiro Nitta
• My research partners at fukuoka.ex
− @zacky1972
− @hisawayex
− @piacere_ex
− @enpedasi
• My friends at hls-friends
− Tech comm. for self-made high-level synthesis tools
3
4. Introduction of
HW Design on FPGA
• What is FPGA?
• Advantages, applications, and design flow
• High level synthesis and high level design
5. “do what needed immediately”
• application specific architecture
• dedicated data width and
customized units as needed
Computational Resources
”do whatever is instructed"
• universal data path to
cover all application cases
• fixed data width and
arithmetic units
processor
with
software
ASIC
as
hardware
5
7. What is FPGA?
• Field Programmable Gate Array
− LSIs whose contents can
be changed any time
− We can design a unique
digital circuit (HW) on it
− Two major vendors
Xilinx・Altera (powered by Intel)
7
IOB
SB
CB LB
IOB
IOB
SB
IOB LB
SB
CB CB
SB
CB
SB SB
CB CBLB IOBIOB LB
SB SB SB
IOB IOB
CB
CB
CB
CB
CB
CB
I/O block
connection blockLB logic block
IOBSB switching block
CB
LUT
IN OUT
0000 1
0001 0
0010 0
… …
1110 1
1111 0
D-FF
D Q
8. Common Design Flow
8
RTL description
RTL simulation
logic synthesis
technology mapping
placement and routing
bitstream generation
• Design by Hardware
Description Language (HDL)
always@(posedge clk) begin
if (!rst) out <= 0;
else begin
case (in)
4'b0001 : tmp <= 1;
4'b0010 : tmp <= 5;
4'b1100 : tmp <= 7;
default : tmp <= 0;
endcase
end
end
assign out = ~{reg[7:4], tmp[3:0]};
post-layout simulation
9. Common Design Flow
9
RTL description
RTL simulation
logic synthesis
technology mapping
placement and routing
bitstream generation
post-layout simulation
LUT
LUT
EDA tools support
almost everything!
10. How to Use of FPGA
10
processor
通信バス
FPGA
Offloading
heavy processing
HW
HW
interface IF
circuit
performance improvement
and low power consumption
can be achieved
SW
SW
communication
between SW/HW
SW
SW
IF
driver
11. Applications of FPGA
• Used as LSIs for rapid design of ASICs
− Functionality of ASICs can be verified before production
− Development of SW can be started before
HW manufacturing is completed
• Used as LSIs for final products
− There are already so many practical consumer products
− It is possible to have a rewriting function after shipping
− Recent practical applications
Financial market transaction, robotics and automotive
Data center / cloud server
Machine Learning!!
11
12. Machine Learning on FPGA
• Accelerator of CNN/DNN
− Neuron synapse values flow through the pipeline
12
Ref: K. Ovtcharov, et al. Toward Accelerating Deep Learning at Scale Using Specialized Hardware in the Datacenter,
HotChips27, 2015.
C. Zhang, et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks, FPGA 2014.
13. Advantage of FPGA
13
FPGA
Memory
Func Func
Func FuncFunc
Func FuncFunc
FuncFunc
• Various systems can be designed onto one LSI
• High performance / low power consumption
• Parallel processing can be realized at task/data level
• Data streaming processing can be realized
14. Current Technology Trends
• Increase in circuit scale and amount of LB
− High performance systems can be realized
− Further increase will continue by new technology
multi-die, 3D stacking,,,
• Tightly coupling with processors
− General-purpose: Connection via PCIe to processors
− Embedded: Integration with embedded processors
14
high-quality system design
in a short time
has become difficult,,,
15. High Level Synthesis (HLS)
• Solution to improve design productivity!
− Technology for synthesizing HDL from behavioral
descriptions with a programming language
C/C++ or its extension is commonly used
− Abstraction level of design becomes higher
15
int func (int x) {
int a[N];
int i;
for(i=0;i<N;i++){
a[i] = ・・・;
:
:
}
:
}
x
func
i
a
16. Commercial HLS Tools
• Xilinx Vivado HLS
− Synthesize from C/C++
− #pragma is offered to
indicate the optimization
16
• Intel SDK for OpenCL
− Synthesize from
OpenCL parallelized code
− Can be executed with same
description as the host PC
Ref: Xilinx Inc. White paper UG902
D. Neto, Optimizing OpenCL for Altera FPGAs, Int’l Workshop on Open CL, 2014.
It is essential to understand
#pragma and libraries deeply
for deriving optimized hardware
17. not only C/C++!!
• Chisel: Scala based
− Object Oriented / Functional styled DSL
• CλaSH: Haskell based
− Synthesize HDL from description of functional language
• Karuta: original scripting language
• Synthesijer: Java based
− HLS from the subset of Java specification
• PyCoRAM, Polyphony: Python based
Veriloggen: Python library for HDL design
• Mulvery: Ruby based
− Synthesis from Reactive Programming
• Octopus🐙:OCaml based
17
developed by
Japanese
hls-friends!!
18. OK, What We Want is,,,
18
We want to design
HW by Elixir!!
We want to operate HW
from our Elixir code!!
19. Concept of Cockatrice
• Elixir Zen Style
• Why Elixir would be suitable for HW design?
• HW synthesis flow from Elixir code
• SW/HW communication interface
20. What is Cockatrice?
• Summoned beast that appears in FF4 (^^;
− The effect is to make all enemies to stones
• Hardware design environment with Elixir!
• Features
− It synthesizes Elixir Zen Styled code
to the description of HW circuits
− It provides communication interface
between Elixir code and HW circuits
20
Your Elixir code can be accelerated,
and low-powered!!
NOTE: Current logo of cockatrice is from Wikipedia
21. Elixir Zen Style
• Enum:
transform data directly
• Flow:
realize parallel processing
intuitively (by MapReduce)
• |>: pipeline operator
express data flow intuitively
• Zen(禅) means the essential beauty
− The essential of programming is data transformation
− Enum Flow |> describe only data transformation
21
input_list
|> Flow.from_enumerable()
|> Flow.map(& foo(&1))
|> Flow.map(fn a->-a end)
|> Enum.to_list
|> Enum.sort
22. Zen’s process model
22
input_list
|> Flow.from_enumerable(stages: 4)
|> Flow.map(& foo(&1))
|> Flow.map(fn a->-a end)
|> Enum.to_list
|> Enum.sort
from_
enumerable
input_list
foo
foo
foo
foo
sortto_list
arbitrator
-a
-a
-a
-a
It’s similar to
efficient HW
architecture!!
23. Zen is suitable for HW design!
23
Cockatrice
input_list
|> Flow.from_enumerable(stages: 4)
|> Flow.map(& foo(&1))
|> Flow.map(fn a->-a end)
|> Enum.to_list
|> Enum.sort
from_
enumerable
input_list
foo
foo
foo
foo
sortto_list
arbitrator
-a
-a
-a
-a
We summon Cockatrice to lithify
Elixir Zen Styled Code
as parallel HW stones!!
24. Effect of Cockatrice
24
Input
List from_
enume
rable
to_list
sort
foo -a
foo -a
foo -a
foo -a
arbitrator
foo -a
foo
-a
foo -a
foo
-a
foo
-a
foo
foo -a
foo
-a
foo -a -a
foo -a
foo -afoo -a
foo -a
25. HW Description by Elixir
• defcockatrice part will be
treat as HW description
− It is completely equivalent
to native Elixir code
You do not need to
consider HW design
It can be verified at
functional level
• HW module can be called
as same as SW function
− We assume SW/HW
cooperative systems
25
26. Synthesis Flow
26
Code analysis &
AST optimization
design desc.
Elixir
templates for IP
DSL
info. of desc.
AST
Synthesis of
HW modules from
Elixir function
HW IP modules
HDL
data flow HW
circuit HDL
HW circuits
bitstream
logic synthesis
SW app
Elixir+C(NIF)
Compilation
of SW
Generation of
device driver
of I/F circuit
Synthesis of
data flow
I/F driver
C(NIF)
27. Code analysis &
AST optimization
design desc.
Elixir
templates for IP
DSL
info. of desc.
AST
Synthesis of
HW modules from
Elixir function
HW IP modules
HDL
data flow HW
circuit HDL
HW circuits
bitstream
logic synthesis
SW app
Elixir+C(NIF)
Compilation
of SW
Generation of
device driver
of I/F circuit
Synthesis of
data flow
I/F driver
C(NIF)
Synthesis Flow
27
Metaprogramming method is employed
to derive AST of Zen styled design
description by Quote function
28. Code analysis &
AST optimization
design desc.
Elixir
templates for IP
DSL
info. of desc.
AST
Synthesis of
HW modules from
Elixir function
HW IP modules
HDL
data flow HW
circuit HDL
HW circuits
bitstream
logic synthesis
SW app
Elixir+C(NIF)
Compilation
of SW
Generation of
device driver
of I/F circuit
Synthesis of
data flow
I/F driver
C(NIF)
Synthesis Flow
28
we provide templates of HDL
code that are equivalent to
Enum functions as DSL files
HDL code is synthesized by
applying pattern matching
with AST and DSL
29. Code analysis &
AST optimization
design desc.
Elixir
templates for IP
DSL
info. of desc.
AST
Synthesis of
HW modules from
Elixir function
HW IP modules
HDL
data flow HW
circuit HDL
HW circuits
bitstream
logic synthesis
SW app
Elixir+C(NIF)
Compilation
of SW
Generation of
device driver
of I/F circuit
Synthesis of
data flow
I/F driver
C(NIF)
Synthesis Flow
29
each modules is connected as data flow
from AST representation of |> and Flow
data flow and parallel processing
HW circuit is finally synthesized!!
30. Code analysis &
AST optimization
design desc.
Elixir
templates for IP
DSL
info. of desc.
AST
Synthesis of
HW modules from
Elixir function
HW IP modules
HDL
data flow HW
circuit HDL
HW circuits
bitstream
logic synthesis
SW app
Elixir+C(NIF)
Compilation
of SW
Generation of
device driver
of I/F circuit
Synthesis of
data flow
I/F driver
C(NIF)
Synthesis Flow
30
communication interface
and its driver are
generated as NIF function
31. Code analysis &
AST optimization
design desc.
Elixir
templates for IP
DSL
info. of desc.
AST
Synthesis of
HW modules from
Elixir function
HW IP modules
HDL
data flow HW
circuit HDL
HW circuits
bitstream
logic synthesis
SW app
Elixir+C(NIF)
Compilation
of SW
Generation of
device driver
of I/F circuit
Synthesis of
data flow
I/F driver
C(NIF)
SW binary and HW bit files are
compiled by respective tools
Synthesis Flow
31
SW binary and HW bit files are
compiled by respective tools
32. SW/HW Comm. Interface
• Activation/Operation to
HW from Elixir code
• Data communication
between SW and HW
− AXI4 bus on Zynq is used
• We implement device
driver as C/NIF module
− ikwzm/udmabuf is used
for DMA transfer
− Elixir/Erlang list should be
converted to C array
32
FPGA
processor
DMA buffer
HW circuits
Elixir app
Erlang VM
device driver
(NIF module)
interface
circuit
36. Discussion
• Currently, we just implement prototypes
− We will publish them as Hex pkgs very soon,,,
− Currently supported features are limited
IOW, we only synthesize Zen styled code
Are another Elixir/Erlang process models
suitable for efficient HW architecture?
− Quantitative evaluation of our proposal will be also
important (to verify academic contribution^^;
36
37. Discussion
• Applicable range of Cockatrice?
− Not only embedded, but also HPC domain!?
Bigger data for Cockatrice would be suitable
since there is some overhead on SW/HW comm.
− AI/ML would be a killer application
Big data stream processing for IoT
Cloud processing that allows users
to change functions flexibly
− We are planning to support large-scale FPGA
boards with comm. interface for PCIe bus
37
38. BTW, I love Nerves!!
• Experiences at Lonestar2019 was great for me!
• I made a presentation to promote the innovation of
Nerves to Japan at Erlang & Elixir Fest 2019!!
38
Nervesが開拓する
『ElixirでIoT』
の新世界
⾼瀬 英希
(京都⼤学/JSTさきがけ)
takase@i.kyoto-u.ac.jp
18
ライブデモのお品書き
1. Nervesプロジェクトの準備とビルド
2. microSDに書き込んでブート・IEx実⾏
3. ソース編集してlocal ssh書き込み
4. NervesHubから書き込み
5. Scenic連携&GPIOデバイスの制御
Raspberry Pi Zero WH Adafruit 128x64 OLED Bonnet
https://github.com/takasehideki/eefest19demo
NervesKey
『ElixirでIoT』の新世界︕
25
デバイス
エッジサーバ クラウド
あらゆるモノ・コト・ヒトを
ネットワーク化︕
情報科学の総合格闘技︕
新たな社会的価値を創出!!
みんなで⼀緒に
IoTを創ろう︕
14
NervesHub
•サーバ経由のOTA (Over The Air) で
Nervesアプリをリモートデプロイ︕
- X.509署名証明書とNervesKey回路で
セキュアな接続経路を実現
- 更新先とファームを任意指定可
I’m Hideki Takase from Kyoto, Japan.
This is the second time for me to attend ElixirConf. First time was Lonestar in this year.
So, nice to see you or long time no see!
It’s my big pleasure to present our work on ElixirConf.
Thank you so much to accept my talk proposal.
粗粒度並列化よりパイプライン化とかのほうが効くかも それを指定できると良いかも
どこかで高位合成しないといけないのだから,HLS Cを吐くアプローチを取ったほうが手っ取り早いのでは?
Elixir/ErlangからCに変換するようなコンパイラの研究ありそう
データを流すから通信が重くなる 共有メモリへのアクセスとかで改善したほうが良さそう
最適化目標を指定できるようにしたほうが良いのか?
I am a researcher in University. My major research topic is system level design methodology for embedded real-time systems, especially for the codesign about processors and FPGA.
And also, I surely have a great interest about Elixir for IoT, especially for Nerves Project.
First of all, let me introduce my research collaborator since Wabi-Sabi is important for Japanese.
I would like to say a big thank to Kentaro and Yasuhiro. They are my students in laboratory, and have made a great effort on this project.
I appreciate the members of fukuoka.ex, It is a Elixir community in Fukuoka, Japan. They always give a technical support for me. As you may know, Zacky and Hisaway will present their work for a novel technology about GPU with Elixir.
I also appreciate the members of hls-friends. It is a Japanese community for self-made high level synthesis tools by various programming languages. These members give useful comments and motivation to my project.
My presentation consists of 3 parts.
How many people do you know about FPGA?
First part is just a lecture in the University. I will introduce about hardware design and FPGA.
OK, let’s go to the 1st part.
First topic is traditional computing resources. There is two resources, processors with software, and hardware such as ASIC.
Processor is the resource which can do whatever it is instructed by software. To cover all requirements from software applications, processor has universal data path, fixed data width and arithmetic units.
On contrary, hardware can do immediately as what you needed. Since hardware is designed to specific requirement, internal architecture is fixed and it has dedicated data width and custom operation units as needed.
There is pros and cons between SW and HW.
For HW, performance and power efficiency are much better compared with processor because HW can be operated as the demand. In addition, high-performance parallel processing can be realized easily if we can design HW carefully.
On the other hands, one of the advantages of processor is design flexibility. We can realize various application respective to your programming.
So, FPGA takes good advantage from both resources. This means that FPGA is better performance and power efficiency than processor with better design flexibility.
I introduce what is FPGA?
FPGA stands for field programmable gate array, that is LSI whose contents can be changed any time as you want. So, we can realize a unique digital circuits on FPGA.
There is 2 major vendors, Xilinx and Intel Altera.
As shown in this figure, internal architecture of FPGA is expressed as the systolic array of logic block, connection, switching, and I/O blocks. The logic block consists of lookup tables and data flip-flop.
So, we can change the HW Behavior by deciding the values of LUTs and their connections.
This is the common design flow for HW design on FPGA.
Hardware description language (HDL) is used to design the HW. We should design the behavior of circuits at register transfer level, such as always block and blocking and non-blocking statements. But it is difficult to design by HDL since its abstraction level is very low.
When we have finished the design of HW, we will check the HW design by register transfer level simulation like this. We should verify the design at cycle level behavior.
These are very busy work.
And then, we should apply logic synthesis, technology mapping, place and routing, However, don’t warry because EDA tools can support almost these steps.
Finally, bit-stream, it is a image file for burning to FPGA chip, can be derived by EDA tools.
I will show general usage of FPGA.
FPGA is typically used with processor as the accelerator.
We can offload the part of heavy processing on processor to the FPGA. So, we expect the performance improvement and power savings by utilizing FPGA.
To communicate between processors and FPGA efficiently, we need the suitable communication interface.
13分になってたら飛ばす
As the application of FPGAs, it was formerly used for rapid design of ASICs. By using FPGA like this, functionality can be verified before the production of ASIC. And more, software developer can start their work before hardware manufacturing is finished.
Recently, FPGA is used for the final products. It is possible to have a rewriting function after shipping. Recently practical applications of FPGA are financial market transaction, robotics, automotive, data and cloud server, and machine learning!
I believe you like machine learning and AI.
Structure of neural network is often expressed as the array of numbers. Please remember the structure of FPGA. Its systolic architecture is suitable for the structure of neural network.
Neuron synapse values flow through the pipeline as the circuits.
Higher quality HW/SW cooperative system can be realized
逐次,分岐,繰り返しなどの制御はステートマシンとして,変数はレジスタ,配列はメモリとして生成される
sequential
オープンソース!
LegUpはトロント大,ChiselはUCB
18分でいきたい
The effect of cockatrice is to make all enemies to stones. So, I decide its codename that makes your Elixir code to HW.
プロセスごとのGCを含む堅牢なメモリ管理
障害時にはプロセス単位で高速に再起動
flow genstage
Metaprogramming
udmabuf is a Linux device driver that allocates contiguous memory blocks in the kernel space as DMA buffers and makes them available from the user space. It is intended that these memory blocks are used as DMA buffers when a user application implements device driver in user space using UIO (User space I/O).
Zynq-7000 devices are equipped with dual-core ARM Cortex-A9 processors integrated with 28nm Artix-7 or Kintex®-7 based programmable logic for excellent performance-per-watt and maximum design flexibility. With up to 6.6M logic cells and offered with transceivers ranging from 6.25Gb/s to 12.5Gb/s, Zynq-7000 devices enable highly differentiated designs for a wide range of embedded applications including multi-camera drivers assistance systems and 4K2K Ultra-HDTV.
EG devices feature a quad-core ARM® Cortex-A53 platform running up to 1.5GHz. Combined with dual-core Cortex-R5 real-time processors, a Mali-400 MP2 graphics processing unit, and 16nm FinFET+ programmable logic, EG devices have the specialized processing elements needed to excel in next-generation wired and 5G wireless infrastructure, cloud computing, and Aerospace and Defense applications.
I’m Hideki Takase from Kyoto, Japan.
This is the second time for me to attend ElixirConf. First time was Lonestar in this year.
So, nice to see you or long time no see!
It’s my great pleasure to present our work. Thank you so much to accept my talk proposal.
粗粒度並列化よりパイプライン化とかのほうが効くかも それを指定できると良いかも
どこかで高位合成しないといけないのだから,HLS Cを吐くアプローチを取ったほうが手っ取り早いのでは?
Elixir/ErlangからCに変換するようなコンパイラの研究ありそう
データを流すから通信が重くなる 共有メモリへのアクセスとかで改善したほうが良さそう
最適化目標を指定できるようにしたほうが良いのか?
Task, GenServer, and so on
Bigger data would be suitable
The new era of "IoT development with Elixir" pioneered by Nerves technology
by providing the training materials from Frank and Justin.
He agreed to hold it in Japan.
Thank you so much, Justin & Frank,
This is the last slide.
I believe if Nerves can control the FPGA directly.