Successfully reported this slideshow.
Your SlideShare is downloading. ×

Cockatrice: A Hardware Design Environment with Elixir

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Erlang os
Erlang os
Loading in …3
×

Check these out next

1 of 41 Ad
Advertisement

More Related Content

Slideshows for you (20)

Similar to Cockatrice: A Hardware Design Environment with Elixir (20)

Advertisement

More from Hideki Takase (20)

Recently uploaded (20)

Advertisement

Cockatrice: A Hardware Design Environment with Elixir

  1. 1. Cockatrice Hardware Design Environment with Elixir Hideki Takase (Kyoto University / JST PRESTO) takase@i.kyoto-u.ac.jp
  2. 2. Who am I? @takasehideki − Assistant Professor at Kyoto University − Researcher at PRESTO program, Japan Science of Technology Agency My Research Topics/Interests − System level design for embedded real-time systems − IoT computing architecture SW/HW codesign for processors and FPGA! − and,,, Elixir for IoT!!! 2
  3. 3. Thank to,,, with Wabi-Sabi • My students in lab. − Kentaro Matsui − Yasuhiro Nitta • My research partners at fukuoka.ex − @zacky1972 − @hisawayex − @piacere_ex − @enpedasi • My friends at hls-friends − Tech comm. for self-made high-level synthesis tools 3
  4. 4. Introduction of HW Design on FPGA • What is FPGA? • Advantages, applications, and design flow • High level synthesis and high level design
  5. 5. “do what needed immediately” • application specific architecture • dedicated data width and customized units as needed Computational Resources ”do whatever is instructed" • universal data path to cover all application cases • fixed data width and arithmetic units processor with software ASIC as hardware 5
  6. 6. Computational Resources 6 design flexibility development cost power efficiency performance FPGA processor with software ASIC as hardware
  7. 7. What is FPGA? • Field Programmable Gate Array − LSIs whose contents can be changed any time − We can design a unique digital circuit (HW) on it − Two major vendors Xilinx・Altera (powered by Intel) 7 IOB SB CB LB IOB IOB SB IOB LB SB CB CB SB CB SB SB CB CBLB IOBIOB LB SB SB SB IOB IOB CB CB CB CB CB CB I/O block connection blockLB logic block IOBSB switching block CB LUT IN OUT 0000 1 0001 0 0010 0 … … 1110 1 1111 0 D-FF D Q
  8. 8. Common Design Flow 8 RTL description RTL simulation logic synthesis technology mapping placement and routing bitstream generation • Design by Hardware Description Language (HDL) always@(posedge clk) begin if (!rst) out <= 0; else begin case (in) 4'b0001 : tmp <= 1; 4'b0010 : tmp <= 5; 4'b1100 : tmp <= 7; default : tmp <= 0; endcase end end assign out = ~{reg[7:4], tmp[3:0]}; post-layout simulation
  9. 9. Common Design Flow 9 RTL description RTL simulation logic synthesis technology mapping placement and routing bitstream generation post-layout simulation LUT LUT EDA tools support almost everything!
  10. 10. How to Use of FPGA 10 processor 通信バス FPGA Offloading heavy processing HW HW interface IF circuit performance improvement and low power consumption can be achieved SW SW communication between SW/HW SW SW IF driver
  11. 11. Applications of FPGA • Used as LSIs for rapid design of ASICs − Functionality of ASICs can be verified before production − Development of SW can be started before HW manufacturing is completed • Used as LSIs for final products − There are already so many practical consumer products − It is possible to have a rewriting function after shipping − Recent practical applications  Financial market transaction, robotics and automotive  Data center / cloud server  Machine Learning!! 11
  12. 12. Machine Learning on FPGA • Accelerator of CNN/DNN − Neuron synapse values flow through the pipeline 12 Ref: K. Ovtcharov, et al. Toward Accelerating Deep Learning at Scale Using Specialized Hardware in the Datacenter, HotChips27, 2015. C. Zhang, et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks, FPGA 2014.
  13. 13. Advantage of FPGA 13 FPGA Memory Func Func Func FuncFunc Func FuncFunc FuncFunc • Various systems can be designed onto one LSI • High performance / low power consumption • Parallel processing can be realized at task/data level • Data streaming processing can be realized
  14. 14. Current Technology Trends • Increase in circuit scale and amount of LB − High performance systems can be realized − Further increase will continue by new technology  multi-die, 3D stacking,,, • Tightly coupling with processors − General-purpose: Connection via PCIe to processors − Embedded: Integration with embedded processors 14 high-quality system design in a short time has become difficult,,,
  15. 15. High Level Synthesis (HLS) • Solution to improve design productivity! − Technology for synthesizing HDL from behavioral descriptions with a programming language C/C++ or its extension is commonly used − Abstraction level of design becomes higher 15 int func (int x) { int a[N]; int i; for(i=0;i<N;i++){ a[i] = ・・・; : : } : } x func i a
  16. 16. Commercial HLS Tools • Xilinx Vivado HLS − Synthesize from C/C++ − #pragma is offered to indicate the optimization 16 • Intel SDK for OpenCL − Synthesize from OpenCL parallelized code − Can be executed with same description as the host PC Ref: Xilinx Inc. White paper UG902 D. Neto, Optimizing OpenCL for Altera FPGAs, Int’l Workshop on Open CL, 2014. It is essential to understand #pragma and libraries deeply for deriving optimized hardware
  17. 17. not only C/C++!! • Chisel: Scala based − Object Oriented / Functional styled DSL • CλaSH: Haskell based − Synthesize HDL from description of functional language • Karuta: original scripting language • Synthesijer: Java based − HLS from the subset of Java specification • PyCoRAM, Polyphony: Python based  Veriloggen: Python library for HDL design • Mulvery: Ruby based − Synthesis from Reactive Programming • Octopus🐙:OCaml based 17 developed by Japanese hls-friends!!
  18. 18. OK, What We Want is,,, 18 We want to design HW by Elixir!! We want to operate HW from our Elixir code!!
  19. 19. Concept of Cockatrice • Elixir Zen Style • Why Elixir would be suitable for HW design? • HW synthesis flow from Elixir code • SW/HW communication interface
  20. 20. What is Cockatrice? • Summoned beast that appears in FF4 (^^; − The effect is to make all enemies to stones • Hardware design environment with Elixir! • Features − It synthesizes Elixir Zen Styled code to the description of HW circuits − It provides communication interface between Elixir code and HW circuits 20 Your Elixir code can be accelerated, and low-powered!! NOTE: Current logo of cockatrice is from Wikipedia
  21. 21. Elixir Zen Style • Enum: transform data directly • Flow: realize parallel processing intuitively (by MapReduce) • |>: pipeline operator express data flow intuitively • Zen(禅) means the essential beauty − The essential of programming is data transformation − Enum Flow |> describe only data transformation 21 input_list |> Flow.from_enumerable() |> Flow.map(& foo(&1)) |> Flow.map(fn a->-a end) |> Enum.to_list |> Enum.sort
  22. 22. Zen’s process model 22 input_list |> Flow.from_enumerable(stages: 4) |> Flow.map(& foo(&1)) |> Flow.map(fn a->-a end) |> Enum.to_list |> Enum.sort from_ enumerable input_list foo foo foo foo sortto_list arbitrator -a -a -a -a It’s similar to efficient HW architecture!!
  23. 23. Zen is suitable for HW design! 23 Cockatrice input_list |> Flow.from_enumerable(stages: 4) |> Flow.map(& foo(&1)) |> Flow.map(fn a->-a end) |> Enum.to_list |> Enum.sort from_ enumerable input_list foo foo foo foo sortto_list arbitrator -a -a -a -a We summon Cockatrice to lithify Elixir Zen Styled Code as parallel HW stones!!
  24. 24. Effect of Cockatrice 24 Input List from_ enume rable to_list sort foo -a foo -a foo -a foo -a arbitrator foo -a foo -a foo -a foo -a foo -a foo foo -a foo -a foo -a -a foo -a foo -afoo -a foo -a
  25. 25. HW Description by Elixir • defcockatrice part will be treat as HW description − It is completely equivalent to native Elixir code You do not need to consider HW design It can be verified at functional level • HW module can be called as same as SW function − We assume SW/HW cooperative systems 25
  26. 26. Synthesis Flow 26 Code analysis & AST optimization design desc. Elixir templates for IP DSL info. of desc. AST Synthesis of HW modules from Elixir function HW IP modules HDL data flow HW circuit HDL HW circuits bitstream logic synthesis SW app Elixir+C(NIF) Compilation of SW Generation of device driver of I/F circuit Synthesis of data flow I/F driver C(NIF)
  27. 27. Code analysis & AST optimization design desc. Elixir templates for IP DSL info. of desc. AST Synthesis of HW modules from Elixir function HW IP modules HDL data flow HW circuit HDL HW circuits bitstream logic synthesis SW app Elixir+C(NIF) Compilation of SW Generation of device driver of I/F circuit Synthesis of data flow I/F driver C(NIF) Synthesis Flow 27 Metaprogramming method is employed to derive AST of Zen styled design description by Quote function
  28. 28. Code analysis & AST optimization design desc. Elixir templates for IP DSL info. of desc. AST Synthesis of HW modules from Elixir function HW IP modules HDL data flow HW circuit HDL HW circuits bitstream logic synthesis SW app Elixir+C(NIF) Compilation of SW Generation of device driver of I/F circuit Synthesis of data flow I/F driver C(NIF) Synthesis Flow 28 we provide templates of HDL code that are equivalent to Enum functions as DSL files HDL code is synthesized by applying pattern matching with AST and DSL
  29. 29. Code analysis & AST optimization design desc. Elixir templates for IP DSL info. of desc. AST Synthesis of HW modules from Elixir function HW IP modules HDL data flow HW circuit HDL HW circuits bitstream logic synthesis SW app Elixir+C(NIF) Compilation of SW Generation of device driver of I/F circuit Synthesis of data flow I/F driver C(NIF) Synthesis Flow 29 each modules is connected as data flow from AST representation of |> and Flow data flow and parallel processing HW circuit is finally synthesized!!
  30. 30. Code analysis & AST optimization design desc. Elixir templates for IP DSL info. of desc. AST Synthesis of HW modules from Elixir function HW IP modules HDL data flow HW circuit HDL HW circuits bitstream logic synthesis SW app Elixir+C(NIF) Compilation of SW Generation of device driver of I/F circuit Synthesis of data flow I/F driver C(NIF) Synthesis Flow 30 communication interface and its driver are generated as NIF function
  31. 31. Code analysis & AST optimization design desc. Elixir templates for IP DSL info. of desc. AST Synthesis of HW modules from Elixir function HW IP modules HDL data flow HW circuit HDL HW circuits bitstream logic synthesis SW app Elixir+C(NIF) Compilation of SW Generation of device driver of I/F circuit Synthesis of data flow I/F driver C(NIF) SW binary and HW bit files are compiled by respective tools Synthesis Flow 31 SW binary and HW bit files are compiled by respective tools
  32. 32. SW/HW Comm. Interface • Activation/Operation to HW from Elixir code • Data communication between SW and HW − AXI4 bus on Zynq is used • We implement device driver as C/NIF module − ikwzm/udmabuf is used for DMA transfer − Elixir/Erlang list should be converted to C array 32 FPGA processor DMA buffer HW circuits Elixir app Erlang VM device driver (NIF module) interface circuit
  33. 33. Our Targets 33 Zynq-7000 Zynq UltraScale+
  34. 34. Demonstration Time?? • Board: Avnet Ultra96-V2 − Zynq UltraScale+ ZU3EG − 1.5GHz quad-core Arm Cortex-A53 − 16nm FinFET+ programmable logic $249.00 • EDA tool: Xilinx Vivado 2019.1 • Software platform − Linux Kernel v4.19.0 with debian10-rootfs-vanilla from ikwzm/ZynqMP-FPGA-Linux build-v2019.1 with udmabuf v1.4.2 as kernel module − Elixir 1.9.1-otp-22 / Erlang 22.0.7 34
  35. 35. Discussion & Future Direction
  36. 36. Discussion • Currently, we just implement prototypes − We will publish them as Hex pkgs very soon,,, − Currently supported features are limited IOW, we only synthesize Zen styled code Are another Elixir/Erlang process models suitable for efficient HW architecture? − Quantitative evaluation of our proposal will be also important (to verify academic contribution^^; 36
  37. 37. Discussion • Applicable range of Cockatrice? − Not only embedded, but also HPC domain!? Bigger data for Cockatrice would be suitable since there is some overhead on SW/HW comm. − AI/ML would be a killer application Big data stream processing for IoT Cloud processing that allows users to change functions flexibly − We are planning to support large-scale FPGA boards with comm. interface for PCIe bus 37
  38. 38. BTW, I love Nerves!! • Experiences at Lonestar2019 was great for me! • I made a presentation to promote the innovation of Nerves to Japan at Erlang & Elixir Fest 2019!! 38 Nervesが開拓する 『ElixirでIoT』 の新世界 ⾼瀬 英希 (京都⼤学/JSTさきがけ) takase@i.kyoto-u.ac.jp 18 ライブデモのお品書き 1. Nervesプロジェクトの準備とビルド 2. microSDに書き込んでブート・IEx実⾏ 3. ソース編集してlocal ssh書き込み 4. NervesHubから書き込み 5. Scenic連携&GPIOデバイスの制御 Raspberry Pi Zero WH Adafruit 128x64 OLED Bonnet https://github.com/takasehideki/eefest19demo NervesKey 『ElixirでIoT』の新世界︕ 25 デバイス エッジサーバ クラウド あらゆるモノ・コト・ヒトを ネットワーク化︕ 情報科学の総合格闘技︕ 新たな社会的価値を創出!! みんなで⼀緒に IoTを創ろう︕ 14 NervesHub •サーバ経由のOTA (Over The Air) で Nervesアプリをリモートデプロイ︕ - X.509署名証明書とNervesKey回路で セキュアな接続経路を実現 - 更新先とファームを任意指定可
  39. 39. Nerves Training in Japan!! Thank you so much, Justin & Frank!! 39
  40. 40. Future Direction 40 What will happen when Nerves meets Cockatrice? Please help us, to evolve the new era of "IoT development with Elixir"
  41. 41. Thank you for your attention!! 41

Editor's Notes

  • I’m Hideki Takase from Kyoto, Japan.
    This is the second time for me to attend ElixirConf. First time was Lonestar in this year.
    So, nice to see you or long time no see!
    It’s my big pleasure to present our work on ElixirConf.
    Thank you so much to accept my talk proposal.

    粗粒度並列化よりパイプライン化とかのほうが効くかも それを指定できると良いかも
    どこかで高位合成しないといけないのだから,HLS Cを吐くアプローチを取ったほうが手っ取り早いのでは?
    Elixir/ErlangからCに変換するようなコンパイラの研究ありそう
    データを流すから通信が重くなる 共有メモリへのアクセスとかで改善したほうが良さそう
    最適化目標を指定できるようにしたほうが良いのか?
  • I am a researcher in University. My major research topic is system level design methodology for embedded real-time systems, especially for the codesign about processors and FPGA.
    And also, I surely have a great interest about Elixir for IoT, especially for Nerves Project.
  • First of all, let me introduce my research collaborator since Wabi-Sabi is important for Japanese.

    I would like to say a big thank to Kentaro and Yasuhiro. They are my students in laboratory, and have made a great effort on this project.
    I appreciate the members of fukuoka.ex, It is a Elixir community in Fukuoka, Japan. They always give a technical support for me. As you may know, Zacky and Hisaway will present their work for a novel technology about GPU with Elixir.
    I also appreciate the members of hls-friends. It is a Japanese community for self-made high level synthesis tools by various programming languages. These members give useful comments and motivation to my project.
  • My presentation consists of 3 parts.
    How many people do you know about FPGA?

    First part is just a lecture in the University. I will introduce about hardware design and FPGA.

    OK, let’s go to the 1st part.
  • First topic is traditional computing resources. There is two resources, processors with software, and hardware such as ASIC.

    Processor is the resource which can do whatever it is instructed by software. To cover all requirements from software applications, processor has universal data path, fixed data width and arithmetic units.
    On contrary, hardware can do immediately as what you needed. Since hardware is designed to specific requirement, internal architecture is fixed and it has dedicated data width and custom operation units as needed.
  • There is pros and cons between SW and HW.
    For HW, performance and power efficiency are much better compared with processor because HW can be operated as the demand. In addition, high-performance parallel processing can be realized easily if we can design HW carefully.
    On the other hands, one of the advantages of processor is design flexibility. We can realize various application respective to your programming.

    So, FPGA takes good advantage from both resources. This means that FPGA is better performance and power efficiency than processor with better design flexibility.
  • I introduce what is FPGA?
    FPGA stands for field programmable gate array, that is LSI whose contents can be changed any time as you want. So, we can realize a unique digital circuits on FPGA.
    There is 2 major vendors, Xilinx and Intel Altera.

    As shown in this figure, internal architecture of FPGA is expressed as the systolic array of logic block, connection, switching, and I/O blocks. The logic block consists of lookup tables and data flip-flop.

    So, we can change the HW Behavior by deciding the values of LUTs and their connections.
  • This is the common design flow for HW design on FPGA.

    Hardware description language (HDL) is used to design the HW. We should design the behavior of circuits at register transfer level, such as always block and blocking and non-blocking statements. But it is difficult to design by HDL since its abstraction level is very low.

    When we have finished the design of HW, we will check the HW design by register transfer level simulation like this. We should verify the design at cycle level behavior.
    These are very busy work.
  • And then, we should apply logic synthesis, technology mapping, place and routing, However, don’t warry because EDA tools can support almost these steps.

    Finally, bit-stream, it is a image file for burning to FPGA chip, can be derived by EDA tools.
  • I will show general usage of FPGA.
    FPGA is typically used with processor as the accelerator.

    We can offload the part of heavy processing on processor to the FPGA. So, we expect the performance improvement and power savings by utilizing FPGA.

    To communicate between processors and FPGA efficiently, we need the suitable communication interface.
  • 13分になってたら飛ばす
    As the application of FPGAs, it was formerly used for rapid design of ASICs. By using FPGA like this, functionality can be verified before the production of ASIC. And more, software developer can start their work before hardware manufacturing is finished.

    Recently, FPGA is used for the final products. It is possible to have a rewriting function after shipping. Recently practical applications of FPGA are financial market transaction, robotics, automotive, data and cloud server, and machine learning!
  • I believe you like machine learning and AI.
    Structure of neural network is often expressed as the array of numbers. Please remember the structure of FPGA. Its systolic architecture is suitable for the structure of neural network.
    Neuron synapse values flow through the pipeline as the circuits.
  • プロセッサ処理より高性能かつ省電力を実現する
    ASICよりも製造コストを抑えられる

    ノイマン・ボトルネックの解消に!
  • Higher quality HW/SW cooperative system can be realized

  • 逐次,分岐,繰り返しなどの制御はステートマシンとして,変数はレジスタ,配列はメモリとして生成される
  • sequential
  • オープンソース!
    LegUpはトロント大,ChiselはUCB
  • 18分でいきたい
  • The effect of cockatrice is to make all enemies to stones. So, I decide its codename that makes your Elixir code to HW.

  • プロセスごとのGCを含む堅牢なメモリ管理
    障害時にはプロセス単位で高速に再起動
  • flow genstage
  • Metaprogramming
  • udmabuf is a Linux device driver that allocates contiguous memory blocks in the kernel space as DMA buffers and makes them available from the user space. It is intended that these memory blocks are used as DMA buffers when a user application implements device driver in user space using UIO (User space I/O).
  • Zynq-7000 devices are equipped with dual-core ARM Cortex-A9 processors integrated with 28nm Artix-7 or Kintex®-7 based programmable logic for excellent performance-per-watt and maximum design flexibility. With up to 6.6M logic cells and offered with transceivers ranging from 6.25Gb/s to 12.5Gb/s, Zynq-7000 devices enable highly differentiated designs for a wide range of embedded applications including multi-camera drivers assistance systems and 4K2K Ultra-HDTV.

    EG devices feature a quad-core ARM® Cortex-A53 platform running up to 1.5GHz. Combined with dual-core Cortex-R5 real-time processors, a Mali-400 MP2 graphics processing unit, and 16nm FinFET+ programmable logic, EG devices have the specialized processing elements needed to excel in next-generation wired and 5G wireless infrastructure, cloud computing, and Aerospace and Defense applications.
  • 32分?
    picocom /dev
    source config.txt
    sudo iex -S mix
  • I’m Hideki Takase from Kyoto, Japan.
    This is the second time for me to attend ElixirConf. First time was Lonestar in this year.
    So, nice to see you or long time no see!
    It’s my great pleasure to present our work. Thank you so much to accept my talk proposal.

    粗粒度並列化よりパイプライン化とかのほうが効くかも それを指定できると良いかも
    どこかで高位合成しないといけないのだから,HLS Cを吐くアプローチを取ったほうが手っ取り早いのでは?
    Elixir/ErlangからCに変換するようなコンパイラの研究ありそう
    データを流すから通信が重くなる 共有メモリへのアクセスとかで改善したほうが良さそう
    最適化目標を指定できるようにしたほうが良いのか?
  • Task, GenServer, and so on
  • Bigger data would be suitable
  • The new era of "IoT development with Elixir" pioneered by Nerves technology
  • by providing the training materials from Frank and Justin.
    He agreed to hold it in Japan.
    Thank you so much, Justin & Frank,

  • This is the last slide.

    I believe if Nerves can control the FPGA directly.

×