TensorFlow XLAのコード解析をしました。
この資料は、TensorFlow XLAのAOT部分に関するものです。
I analyzed the code of TensorFlow XLA.
This document pertains to JIT part of TensorFlow XLA.
This document discusses RISC-V boot processes using the Berkeley Boot Loader (BBL) and RISC-V Proxy Kernel (PK). It explains how upon reset, code in Machine mode initializes the system and switches to Supervisor mode. The boot loader then loads an application ELF into memory. For BBL, it loads a Linux kernel, and for PK it loads a user application. Control is then transferred to the loaded program in User mode. Trap handling mechanisms involving different privilege modes are also covered.
FPGAアクセラレータの作り方
IBM POWER + CAPI編
http://www.slideshare.net/ssuser479fa3/ibm-capi
にHDLシミュレーション環境を追記したものです。
How to make an FPGA accelerator
IBM POWER + CAPI version
The HDL simulation environment is added to the document of
http://www.slideshare.net/ssuser479fa3/ibm-capi .
TensorFlow XLAのコード解析をしました。
この資料は、TensorFlow XLAのAOT部分に関するものです。
I analyzed the code of TensorFlow XLA.
This document pertains to JIT part of TensorFlow XLA.
This document discusses RISC-V boot processes using the Berkeley Boot Loader (BBL) and RISC-V Proxy Kernel (PK). It explains how upon reset, code in Machine mode initializes the system and switches to Supervisor mode. The boot loader then loads an application ELF into memory. For BBL, it loads a Linux kernel, and for PK it loads a user application. Control is then transferred to the loaded program in User mode. Trap handling mechanisms involving different privilege modes are also covered.
FPGAアクセラレータの作り方
IBM POWER + CAPI編
http://www.slideshare.net/ssuser479fa3/ibm-capi
にHDLシミュレーション環境を追記したものです。
How to make an FPGA accelerator
IBM POWER + CAPI version
The HDL simulation environment is added to the document of
http://www.slideshare.net/ssuser479fa3/ibm-capi .
TensorFlow XLAのコード解析をしました。
この資料は、TensorFlow XLAのJIT部分に関するものです。
I analyzed the code of TensorFlow XLA.
This document pertains to JIT part of TensorFlow XLA.
2017/07/01
チラ見版から前ページ公開版に切り替えました。
また、最新版で導入された plugin についても追記しました。
2017/07/30
r1.3のコードを反映しました。
2017/08/04
r1.3のpluginのコード、動かすことに成功しました。
"device:XLA_EXEC:0" で、StreamExecutorが起動するところまで実行できました。
2017/08/07
2017/08/04に対して、10頁追加しました。
とりあえず、r1.3に対しては、これにて終了です。
Tegra 186 (Tegra-P1 : Pascal GPU搭載のTegra)のu-bootとLinuxについて、
特に、BPMP (Boot and Power Management Processer)に関してです。
About u-boot and Linux of Tegra 186 (Tegra-P1: Tegra with Pascal GPU)
In particular, it is about BPMP (Boot and Power Management Processer).
This document discusses how to program FPGAs using free and open source software tools. It explains what an FPGA is and how it works, and walks through examples of using Verilog to program simple logic gates and circuits on an FPGA board. These include turning on an LED, implementing a NAND gate in hardware, and using the FPGA as an interactive input/output device. The document promotes the IceStorm toolchain for programming Lattice FPGA boards and outlines several other hardware description languages and frameworks. It concludes by discussing future developments that could expand the use of open source FPGA programming.
ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)Mr. Vengineer
2016年2月20日(金)のZynq Ultrasclae+ MPSoC 勉強会で使った資料です。
追記) 2016.05.08
公式ARM Trusted Firmwareのサイトに、Zynq UltraScale+ MPSoCの実装が追加されていていることを明記した
This is the material I used at Zynq Ultrasclae + MPSoC SIG on 20th February (Friday).
Addendum) 2016.05.08
We stated that the implementation of Zynq UltraScale + MPSoC was added to the official ARM Trusted Firmware site.
The document discusses software driven verification using Xilinx's xsim simulator. It describes using the Xilinx Simulator Interface (XSI) which allows a C/C++ program to act as a testbench for an HDL design in xsim. It provides details on how to use XSI functions like getting port numbers and values, running simulation, and controlling simulation from C++. It also discusses calling XSI functions through dynamic linking and using SystemVerilog DPI to directly access the DUT from C++.
TVM uses Verilator and DPI to connect Verilog/Chisel accelerator models written in SystemVerilog/Chisel to Python code. It initializes the hardware model and controls simulation using methods like SimLaunch, SimWait, SimResume. The Python code loads the accelerator module, allocates memory, runs the accelerator by calling driver functions that interface with the DPI to initialize, launch and wait for completion of the accelerator. This allows accelerators developed in Verilog/Chisel to be tested from Python.
Cloud Deep Learning Chips Training & InferenceMr. Vengineer
This document summarizes various chips for deep learning training and inference in the cloud from companies such as Google, Intel, Habana Labs, Alibaba, and Graphcore. It provides information on the specs and capabilities of each chip, such as the memory type and TFLOPS, and links to product pages and documentation. It also discusses collaborations between companies on projects like Glow, ONNX, and OCP accelerator modules.
Glow is a compiler and execution engine for neural networks created by Facebook. It takes a high-level graph representation of a neural network and compiles it into efficient machine code for different hardware backends like CPU and OpenCL. The key steps in Glow include loading a model, optimizing the graph, lowering it to a low-level IR, scheduling operations to minimize memory usage, generating instructions for the backend, and performing optimizations specific to the target. Glow aims to provide a portable way to deploy neural networks across different hardware platforms.
Bridge TensorFlow to run on Intel nGraph backends (v0.4)Mr. Vengineer
This document discusses bridging TensorFlow to run on Intel nGraph backends. It summarizes various optimization passes used in the nGraph-TensorFlow integration, including passes to liberate nodes from placement constraints, confirm placement, cluster the graph, and encapsulate clusters. Key points:
- NGraphLiberatePass and NGraphConfirmPass run during the PRE_PLACEMENT phase to handle nGraph placement
- NGraphClusterPass runs during POST_REWRITE_FOR_EXEC to cluster the graph into subgraphs, similar to XLA partitioning
- NGraphEncapsulatePass encapsulates clusters into NGraphEncapsulateOp nodes, analogous to XLA's use of _XlaLaunchOp
-
Bridge TensorFlow to run on Intel nGraph backends (v0.5)Mr. Vengineer
The document describes how the nGraph TensorFlow bridge works by rewriting TensorFlow graphs to run on Intel nGraph backends. It discusses how optimization passes are used to modify the graph in several phases: 1) Capturing TensorFlow variables as nGraph variables, 2) Marking/assigning/deassigning nodes to clusters, 3) Encapsulating clusters into nGraphEncapsulateOp nodes to run subgraphs on nGraph. Key classes and files involved are described like NGraphVariableCapturePass, NGraphEncapsulatePass, and how they implement the different rewriting phases to prepare the graph for nGraph execution.
TensorFlow XLAの中では、
XLA Client を Pythonで利用できるようになっています。
また、2018年2月に開催されたSysMLの論文(JAX@Google)についても追記しました。
In TensorFlow XLA,
XLA Client is now available in Python.
Also added about SysML's paper (JAX @ Google) held in February 2018.
Tiramisu is a code optimization and generation framework that can be integrated into custom compilers. It supports various backends including multi-CPU (using LLVM), GPU (using CUDA), distributed systems (using MPI), and FPGAs (using Xilinx Vivado HLS). Tiramisu uses polyhedral representations to support irregular domains beyond just rectangles. The document provides an overview of Tiramisu and discusses challenges related to supporting different platforms, memory dependencies, efficient code generation, and representations. It also mentions that Tiramisu uses Halide and ISL.
Tiramisu : A Code Optimization Framework for High Performance Systems
https://www.csail.mit.edu/research/tiramisu-framework-code-optimization-and-code-generation
の概要です。
ドキュメントがほとんどないので、ソースコード解析をやって、サンプルプログラムの内容について、調べてみました。