SlideShare a Scribd company logo
SFO17-509
Deep Learning on ARM Platforms
- from the platform angle
Jammy Zhou - Linaro
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Agenda
● Deep learning basics
● Platform overview
● Gaps and challenges
ENGINEERS AND DEVICES
WORKING TOGETHER
Basic Elements
Algorithms & Models
CNN, R-CNN, RNN, LSTM, Feed
Forward, BP, SGD, Deep
Reinforcement learning, etc
AlexNet, VGG, GoogLeNet,
ResNet, SqueezeNet, SyntaxNet,
MobileNet, SSD, YOLO, etc
Computing Platforms
CPU, GPU, FPGA, TPU, etc
Deep learning software stack
Datasets
MNIST, ImageNet, CIFAR10,
CIFAR100, AudioSet, DET,
CLS-LOC, ActivityNet, etc
+ +
Deep Learning
ENGINEERS AND DEVICES
WORKING TOGETHER
● Training on cloud & data center (desktop workstation can be used as well for development)
● Inference on cloud & data center or edge device
Training & Inference
Cloud and Data Center Edge Devices
Dataset
& Labels
DL Framework Trained Model
Device1
Device2
New Data
Inference Result
Trained
Model DL Framework
New Data
Inference Result
Batch
processing
for large
dataset
Untrained Model
DL Framework
ENGINEERS AND DEVICES
WORKING TOGETHER
Platform Overview
Python Apps
Frameworks
Drivers
Libraries
Hardware Accelerators
TensorFlowCaffe MXNet Theano
Caffe2 Torch7 CNTK Paddle...
C++ Apps Go Apps Java Apps...
BLAS FFT RNG SPARSE Eigen...
Application support for
multiple frameworks?
Framework support for
multiple accelerator
solutions?
ENGINEERS AND DEVICES
WORKING TOGETHER
Heterogeneous Accelerators
● Nvidia
○ Tesla, Quadro, GeForce, and etc
○ Drive PX for self driving
○ Jetson TK1/TX1/TX2 for embedded
● AMD
○ Radeon Instinct, Radeon Pro
○ Embedded G series
● Mobile GPUs
○ ARM Mali
○ Qualcomm Adreno
○ Imagination PowerVR
○ VeriSilicon Vivante
● Google TPU
● Intel Nervana
● Nvidia Xavier DLA
● Horizon Robotics (地平线) BPU
● Cambricon (寒武纪) NPU
● Fujitsu DLU
● GraphCore IPU
● Intel/Altera Stratix, Arria, Cyclone
● Xilinx Ultrascale, Ultrascale+
● Deephi Tech (深鉴科技) DPU (Xilinx based)
● Baidu XPU (Xilinx based)
● Qualcomm Hexagon
● CEVA XM
● TI C66x
● Cadence Tensilica Vision
GPU ASIC
FPGA DSP
ENGINEERS AND DEVICES
WORKING TOGETHER
Interface & Libraries
● Nvidia CUDA
○ Proprietary (including the driver stack)
○ Widely supported by the DL frameworks and
the industry
● AMD HIP
○ Fully open source together with the ROCm
driver stack
○ Support both NVCC (for CUDA) and HCC
compiler backends
● Khronos Standards
○ OpenCL
■ Support not only GPU, but also FPGA,
DSP and other accelerators
○ OpenVX with Neural Network Extension
■ Acceleration for image processing and
computer vision applications
○ Vulkan
■ Designed for both graphics and
compute, mainly used for graphics now
● Nvidia
○ cuDNN, cuBLAS, cuSPARSE, etc
● AMD
○ MIOpen, hipBLAS/rocBLAS, hcRNG,
hcFFT/rocFFT, etc
● OpenCL
○ AMD clBLAS, clFFT, clRNG, clSPARSE
○ ARM Compute Library, requires:
■ cl_arm_non_uniform_work_group_size
■ cl_khr_fp16
○ Qualcomm Neural Processing Engine
● Others
○ Eigen - C++ template library for LA
○ OpenBLAS, NNPack, MKL, etc
○ OpenCV - Open source CV library
■ Have Halide based DNN module
○ Xilinx DNN and GEMM libraries
ENGINEERS AND DEVICES
WORKING TOGETHER
Frameworks
Vendor Languages CUDA HIP OpenCL ACL NPE[1]
Caffe BVLC C++, Python, Matlab Yes Yes [2]
Yes [2]
Yes [2]
Yes
Caffe2 Facebook C++, Python Yes Yes [2]
No No Yes
TensorFlow Google Python, C++, JAVA, Go, etc Yes No [3]
Yes [2]
No Yes
MXNet Apache Python, R, C++, Perl, etc Yes Yes [2]
No Yes [2]
No
Torch Community Lua, Python, Matlab Yes Yes [2]
Yes [2]
No No
Theano Montreal Python Yes No Yes [4]
No No
CNTK Microsoft Python, C++, C#, .Net, etc Yes No [3]
No No No
Paddle Baidu Python, Go Yes No No No No
[1] The trained models are converted to DLC format for execution
[2] Out-of-tree support, by vendors and community
[3] Under development
[4] Seems incomplete and buggy
ENGINEERS AND DEVICES
WORKING TOGETHER
Benchmark and Testing
● Collective Knowledge SDK for AI
○ Automate benchmarking, optimization, co-design the
whole deep learning stack to satisfy different
requirements (e.g, execution time, accuracy, power
consumption, memory usage, etc)
○ Have portable and customizable workflow framework
○ CK modules with open & unified JSON APIs are used to
abstract access to changing SW & HW
○ Image classification and compiler flag prediction are
supported now
● DeepBench from Baidu
○ Benchmark basic low-level operations on different
hardwares for deep learning
● Convnet-benchmarks
ENGINEERS AND DEVICES
WORKING TOGETHER
Gaps and Challenges - Arch Agnostic
● Vendor specific APIs are used for heterogeneous accelerators
○ Nvidia CUDA, AMD HIP, Google TPU StreamExecutor, and etc
○ Can OpenCL or something else be a competitive solution to rule the other world?
● OpenCL support is quited limited so far
○ There is no official OpenCL acceleration support by default for almost all major DL frameworks,
although some of them have out-of-tree experimental support.
■ Can we get OpenCL support in upstream DL frameworks to reduce maintain effort?
■ Shall we enable and optimize OpenCL support for more DL frameworks?
○ Can Coriander be a solution to use existing CUDA backend of DL frameworks on OpenCL?
■ AMD HIP has similar mechanism to support CUDA on HIP
■ Coriander is still a research project with limited DL framework and HW support
ENGINEERS AND DEVICES
WORKING TOGETHER
Coriander - CUDA on OpenCL
http://www.iwocl.org/wp-content/uploads/iwocl2017-hugh-perkins-cuda-cl.pdf
Compile time Run time
ENGINEERS AND DEVICES
WORKING TOGETHER
Gaps and Challenges - Arch Agnostic (Cont.)
● The DL frameworks are fragmented
○ It is not easy to add new accelerator support, and it has to be done for each framework
■ Unified graph IR and tensor IR support for DL frameworks will be helpful
● See next page for several major graph IR implementations
○ There is no good interoperability between different DL frameworks
■ Operator and tensor sharing
● DLPack proposal seems promising, adopted by MXNet, PyTorch, Caffe2, tiny-dnn
■ Model translations between different DL frameworks
○ Applications need unified APIs for portability and flexibility
■ Keras: a Python DL library supporting TensorFlow, CNTK, Theano and MXNet backends
■ TensorFlow Lite and Neural Network API support on Android
● To support pre-trained models with other DL frameworks, translation will be
required
ENGINEERS AND DEVICES
WORKING TOGETHER
Computation Graph IR & Tensor IR
● Google XLA
○ A domain-specific compiler, which makes it easier to add
custom backends, LLVM backends can be reused
○ JIT to analyse TF graph, fuse operators at runtime
○ AOT to support TF models on memory restricted devices -
helpful for ARM devices
● MXNet NNVM & TVM
○ NNVM-Fusion plugin for GPU kernel optimization (e.g, ops
fusion, runtime compilation)
○ TinyFlow: TF Front-End + NNVM + Torch7 Back-End
○ TVM: a tensor IR/DSL stack
■ It is DLpack compatible, and reuses Halide IR
● Intel/Nervana NGraph
○ Neon is supported, integrations with TF XLA and Caffe are
ongoing
● ONNX (Open Neural Network Exchange)
○ Initiated by Facebook and Microsoft
○ Caffe2, PyTorch and CNTK will support it
Who will become the ‘LLVM’ for DL?
ENGINEERS AND DEVICES
WORKING TOGETHER
Gaps and Challenges - ARM Specific
● There are seldom ARM boards with good PCIe capability in the community
○ AMD ROCm requires PCIe Gen3 x8 or x16 with PCIe Atomics
○ High-end FPGA cards requires PCIe Gen3 x16 or Gen4 x8 (e.g, Virtex Ultrascale+ from Xilinx)
● Discrete graphics driver support on ARM platforms is quite limited
○ Nvidia Proprietary Linux GPU driver only supports Aarch32
○ AMD ROCm has support for Cavium ThunderX, but no generic ARM support
○ GPU virtualization support is also needed by ARM data centers
● There is no good OpenCL driver distribution for mobile GPUs
○ OpenCL support for 96Boards needs improving
● Optimizations for edge devices
○ Memory footprint optimization for DL frameworks
■ e.g, quantization and low precision inference support being done by Google for TensorFlow
○ Power efficiency with different accelerators
● Anything else?
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Backup
ENGINEERS AND DEVICES
WORKING TOGETHER
96Boards with OpenCL support
96Boards Accelerators OpenCL version Driver Public Availability
Qualcomm DB410c Adreno 306, Hexagon QDSP6 V5 OpenCL 1.1 EP Yes for Android?
Bubblegum-96(*)
PowerVR G6230 OpenCL 1.2 EP Yes for Linux
Mediatek X20 Mali T880 MP4 OpenCL 1.2 TBD
Hisilicon Hikey960 Mali G71 MP8 OpenCL 2.0 Ongoing for Android
Hisilicon Poplar Mali T720 OpenCL 1.2 TBD
Qualcomm SD600eval Adreno 320, Hexagon QDSP6 V4 OpenCL 1.1 TBD
Socionext F-Cue Mali T624 OpenCL 1.1 TBD
Mavell Andromeda Vivante GC7000UL OpenCL 1.2 TBD
MStar Kava Mali T820 OpenCL 1.2 TBD
Altera Chameleon96 Cyclone V SoC FPGA OpenCL 1.0 EP TBD
* ACL cannot be used on Bubblegum-96 for missing required OpenCL extensions
Thank You
#SFO17
SFO17 keynotes and videos on: connect.linaro.org
For further information: www.linaro.org

More Related Content

What's hot

LAS16-207: Bus scaling QoS
LAS16-207: Bus scaling QoSLAS16-207: Bus scaling QoS
LAS16-207: Bus scaling QoS
Linaro
 
LAS16-405:OpenDataPlane: Software Defined Dataplane leader
LAS16-405:OpenDataPlane: Software Defined Dataplane leaderLAS16-405:OpenDataPlane: Software Defined Dataplane leader
LAS16-405:OpenDataPlane: Software Defined Dataplane leader
Linaro
 
BUD17-300: Journey of a packet
BUD17-300: Journey of a packetBUD17-300: Journey of a packet
BUD17-300: Journey of a packet
Linaro
 
Las16 309 - lua jit arm64 port - status
Las16 309 - lua jit arm64 port - statusLas16 309 - lua jit arm64 port - status
Las16 309 - lua jit arm64 port - status
Linaro
 
LAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96BoardsLAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96Boards
Linaro
 
BUD17-104: Scripting Languages in IoT: Challenges and Approaches
BUD17-104: Scripting Languages in IoT: Challenges and ApproachesBUD17-104: Scripting Languages in IoT: Challenges and Approaches
BUD17-104: Scripting Languages in IoT: Challenges and Approaches
Linaro
 
Ostech war story using mainline linux for an android tv bsp
Ostech  war story  using mainline linux  for an android tv bspOstech  war story  using mainline linux  for an android tv bsp
Ostech war story using mainline linux for an android tv bsp
Neil Armstrong
 
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Linaro
 
BKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing HadoopBKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing Hadoop
Linaro
 
BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!
Linaro
 
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSD
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSDLAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSD
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSD
Linaro
 
LAS16-200: SCMI - System Management and Control Interface
LAS16-200:  SCMI - System Management and Control InterfaceLAS16-200:  SCMI - System Management and Control Interface
LAS16-200: SCMI - System Management and Control Interface
Linaro
 
BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64
Linaro
 
LAS16-507: LXC support in LAVA
LAS16-507: LXC support in LAVALAS16-507: LXC support in LAVA
LAS16-507: LXC support in LAVA
Linaro
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI Supercomputer
George Markomanolis
 
BUD17 Socionext SC2A11 ARM Server SoC
BUD17 Socionext SC2A11 ARM Server SoCBUD17 Socionext SC2A11 ARM Server SoC
BUD17 Socionext SC2A11 ARM Server SoC
Linaro
 
Hands on OpenCL
Hands on OpenCLHands on OpenCL
Hands on OpenCL
Vladimir Starostenkov
 
Linaro Connect San Francisco 2017 - Welcome Keynote by George Grey | #SFO17
Linaro Connect San Francisco 2017 - Welcome Keynote by George Grey | #SFO17Linaro Connect San Francisco 2017 - Welcome Keynote by George Grey | #SFO17
Linaro Connect San Francisco 2017 - Welcome Keynote by George Grey | #SFO17
Linaro
 
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of ThingsJerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
Samsung Open Source Group
 
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from HellLAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
Linaro
 

What's hot (20)

LAS16-207: Bus scaling QoS
LAS16-207: Bus scaling QoSLAS16-207: Bus scaling QoS
LAS16-207: Bus scaling QoS
 
LAS16-405:OpenDataPlane: Software Defined Dataplane leader
LAS16-405:OpenDataPlane: Software Defined Dataplane leaderLAS16-405:OpenDataPlane: Software Defined Dataplane leader
LAS16-405:OpenDataPlane: Software Defined Dataplane leader
 
BUD17-300: Journey of a packet
BUD17-300: Journey of a packetBUD17-300: Journey of a packet
BUD17-300: Journey of a packet
 
Las16 309 - lua jit arm64 port - status
Las16 309 - lua jit arm64 port - statusLas16 309 - lua jit arm64 port - status
Las16 309 - lua jit arm64 port - status
 
LAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96BoardsLAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96Boards
 
BUD17-104: Scripting Languages in IoT: Challenges and Approaches
BUD17-104: Scripting Languages in IoT: Challenges and ApproachesBUD17-104: Scripting Languages in IoT: Challenges and Approaches
BUD17-104: Scripting Languages in IoT: Challenges and Approaches
 
Ostech war story using mainline linux for an android tv bsp
Ostech  war story  using mainline linux  for an android tv bspOstech  war story  using mainline linux  for an android tv bsp
Ostech war story using mainline linux for an android tv bsp
 
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
BKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing HadoopBKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing Hadoop
 
BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!
 
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSD
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSDLAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSD
LAS16-210: Hardware Assisted Tracing on ARM with CoreSight and OpenCSD
 
LAS16-200: SCMI - System Management and Control Interface
LAS16-200:  SCMI - System Management and Control InterfaceLAS16-200:  SCMI - System Management and Control Interface
LAS16-200: SCMI - System Management and Control Interface
 
BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64
 
LAS16-507: LXC support in LAVA
LAS16-507: LXC support in LAVALAS16-507: LXC support in LAVA
LAS16-507: LXC support in LAVA
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI Supercomputer
 
BUD17 Socionext SC2A11 ARM Server SoC
BUD17 Socionext SC2A11 ARM Server SoCBUD17 Socionext SC2A11 ARM Server SoC
BUD17 Socionext SC2A11 ARM Server SoC
 
Hands on OpenCL
Hands on OpenCLHands on OpenCL
Hands on OpenCL
 
Linaro Connect San Francisco 2017 - Welcome Keynote by George Grey | #SFO17
Linaro Connect San Francisco 2017 - Welcome Keynote by George Grey | #SFO17Linaro Connect San Francisco 2017 - Welcome Keynote by George Grey | #SFO17
Linaro Connect San Francisco 2017 - Welcome Keynote by George Grey | #SFO17
 
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of ThingsJerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
 
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from HellLAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
 

Similar to Deep Learning on ARM Platforms - SFO17-509

LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience Report
Linaro
 
Bsdtw17: johannes m dieterich: high performance computing and gpu acceleratio...
Bsdtw17: johannes m dieterich: high performance computing and gpu acceleratio...Bsdtw17: johannes m dieterich: high performance computing and gpu acceleratio...
Bsdtw17: johannes m dieterich: high performance computing and gpu acceleratio...
Scott Tsai
 
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookLinux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Danny Al-Gaaf
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
Edge AI and Vision Alliance
 
A Journey into Hexagon: Dissecting Qualcomm Basebands
A Journey into Hexagon: Dissecting Qualcomm BasebandsA Journey into Hexagon: Dissecting Qualcomm Basebands
A Journey into Hexagon: Dissecting Qualcomm Basebands
Priyanka Aash
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
Kelum Senanayake
 
LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205
Linaro
 
AI Hardware Landscape 2021
AI Hardware Landscape 2021AI Hardware Landscape 2021
AI Hardware Landscape 2021
Grigory Sapunov
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AI
Tyrone Systems
 
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to EmbeddedLAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
Linaro
 
Embedded platform choices
Embedded platform choicesEmbedded platform choices
Embedded platform choices
Tavish Naruka
 
Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7
Kynetics
 
Machine Learning Developers - Know your GPUs
Machine Learning Developers - Know your GPUsMachine Learning Developers - Know your GPUs
Machine Learning Developers - Know your GPUs
Amazon Web Services
 
Nvidia at SEMICon, Munich
Nvidia at SEMICon, MunichNvidia at SEMICon, Munich
Nvidia at SEMICon, Munich
Alison B. Lowndes
 
High-Performance Computing with C++
High-Performance Computing with C++High-Performance Computing with C++
High-Performance Computing with C++
JetBrains
 
LAS16-TR06: Remoteproc & rpmsg development
LAS16-TR06: Remoteproc & rpmsg developmentLAS16-TR06: Remoteproc & rpmsg development
LAS16-TR06: Remoteproc & rpmsg development
Linaro
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
inside-BigData.com
 
Leveraging open source for large scale analytics
Leveraging open source for large scale analyticsLeveraging open source for large scale analytics
Leveraging open source for large scale analytics
South West Data Meetup
 
"New Standards for Embedded Vision and Neural Networks," a Presentation from ...
"New Standards for Embedded Vision and Neural Networks," a Presentation from ..."New Standards for Embedded Vision and Neural Networks," a Presentation from ...
"New Standards for Embedded Vision and Neural Networks," a Presentation from ...
Edge AI and Vision Alliance
 
Cuda
CudaCuda

Similar to Deep Learning on ARM Platforms - SFO17-509 (20)

LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience Report
 
Bsdtw17: johannes m dieterich: high performance computing and gpu acceleratio...
Bsdtw17: johannes m dieterich: high performance computing and gpu acceleratio...Bsdtw17: johannes m dieterich: high performance computing and gpu acceleratio...
Bsdtw17: johannes m dieterich: high performance computing and gpu acceleratio...
 
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookLinux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
 
A Journey into Hexagon: Dissecting Qualcomm Basebands
A Journey into Hexagon: Dissecting Qualcomm BasebandsA Journey into Hexagon: Dissecting Qualcomm Basebands
A Journey into Hexagon: Dissecting Qualcomm Basebands
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
 
LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205
 
AI Hardware Landscape 2021
AI Hardware Landscape 2021AI Hardware Landscape 2021
AI Hardware Landscape 2021
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AI
 
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to EmbeddedLAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
 
Embedded platform choices
Embedded platform choicesEmbedded platform choices
Embedded platform choices
 
Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7
 
Machine Learning Developers - Know your GPUs
Machine Learning Developers - Know your GPUsMachine Learning Developers - Know your GPUs
Machine Learning Developers - Know your GPUs
 
Nvidia at SEMICon, Munich
Nvidia at SEMICon, MunichNvidia at SEMICon, Munich
Nvidia at SEMICon, Munich
 
High-Performance Computing with C++
High-Performance Computing with C++High-Performance Computing with C++
High-Performance Computing with C++
 
LAS16-TR06: Remoteproc & rpmsg development
LAS16-TR06: Remoteproc & rpmsg developmentLAS16-TR06: Remoteproc & rpmsg development
LAS16-TR06: Remoteproc & rpmsg development
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Leveraging open source for large scale analytics
Leveraging open source for large scale analyticsLeveraging open source for large scale analytics
Leveraging open source for large scale analytics
 
"New Standards for Embedded Vision and Neural Networks," a Presentation from ...
"New Standards for Embedded Vision and Neural Networks," a Presentation from ..."New Standards for Embedded Vision and Neural Networks," a Presentation from ...
"New Standards for Embedded Vision and Neural Networks," a Presentation from ...
 
Cuda
CudaCuda
Cuda
 

More from Linaro

Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Linaro
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Linaro
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
Linaro
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
Linaro
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Linaro
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
Linaro
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
Linaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
Linaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
Linaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
Linaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
Linaro
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
Linaro
 

More from Linaro (20)

Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
 

Recently uploaded

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 

Recently uploaded (20)

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 

Deep Learning on ARM Platforms - SFO17-509

  • 1. SFO17-509 Deep Learning on ARM Platforms - from the platform angle Jammy Zhou - Linaro
  • 2. ENGINEERS AND DEVICES WORKING TOGETHER Agenda ● Deep learning basics ● Platform overview ● Gaps and challenges
  • 3. ENGINEERS AND DEVICES WORKING TOGETHER Basic Elements Algorithms & Models CNN, R-CNN, RNN, LSTM, Feed Forward, BP, SGD, Deep Reinforcement learning, etc AlexNet, VGG, GoogLeNet, ResNet, SqueezeNet, SyntaxNet, MobileNet, SSD, YOLO, etc Computing Platforms CPU, GPU, FPGA, TPU, etc Deep learning software stack Datasets MNIST, ImageNet, CIFAR10, CIFAR100, AudioSet, DET, CLS-LOC, ActivityNet, etc + + Deep Learning
  • 4. ENGINEERS AND DEVICES WORKING TOGETHER ● Training on cloud & data center (desktop workstation can be used as well for development) ● Inference on cloud & data center or edge device Training & Inference Cloud and Data Center Edge Devices Dataset & Labels DL Framework Trained Model Device1 Device2 New Data Inference Result Trained Model DL Framework New Data Inference Result Batch processing for large dataset Untrained Model DL Framework
  • 5. ENGINEERS AND DEVICES WORKING TOGETHER Platform Overview Python Apps Frameworks Drivers Libraries Hardware Accelerators TensorFlowCaffe MXNet Theano Caffe2 Torch7 CNTK Paddle... C++ Apps Go Apps Java Apps... BLAS FFT RNG SPARSE Eigen... Application support for multiple frameworks? Framework support for multiple accelerator solutions?
  • 6. ENGINEERS AND DEVICES WORKING TOGETHER Heterogeneous Accelerators ● Nvidia ○ Tesla, Quadro, GeForce, and etc ○ Drive PX for self driving ○ Jetson TK1/TX1/TX2 for embedded ● AMD ○ Radeon Instinct, Radeon Pro ○ Embedded G series ● Mobile GPUs ○ ARM Mali ○ Qualcomm Adreno ○ Imagination PowerVR ○ VeriSilicon Vivante ● Google TPU ● Intel Nervana ● Nvidia Xavier DLA ● Horizon Robotics (地平线) BPU ● Cambricon (寒武纪) NPU ● Fujitsu DLU ● GraphCore IPU ● Intel/Altera Stratix, Arria, Cyclone ● Xilinx Ultrascale, Ultrascale+ ● Deephi Tech (深鉴科技) DPU (Xilinx based) ● Baidu XPU (Xilinx based) ● Qualcomm Hexagon ● CEVA XM ● TI C66x ● Cadence Tensilica Vision GPU ASIC FPGA DSP
  • 7. ENGINEERS AND DEVICES WORKING TOGETHER Interface & Libraries ● Nvidia CUDA ○ Proprietary (including the driver stack) ○ Widely supported by the DL frameworks and the industry ● AMD HIP ○ Fully open source together with the ROCm driver stack ○ Support both NVCC (for CUDA) and HCC compiler backends ● Khronos Standards ○ OpenCL ■ Support not only GPU, but also FPGA, DSP and other accelerators ○ OpenVX with Neural Network Extension ■ Acceleration for image processing and computer vision applications ○ Vulkan ■ Designed for both graphics and compute, mainly used for graphics now ● Nvidia ○ cuDNN, cuBLAS, cuSPARSE, etc ● AMD ○ MIOpen, hipBLAS/rocBLAS, hcRNG, hcFFT/rocFFT, etc ● OpenCL ○ AMD clBLAS, clFFT, clRNG, clSPARSE ○ ARM Compute Library, requires: ■ cl_arm_non_uniform_work_group_size ■ cl_khr_fp16 ○ Qualcomm Neural Processing Engine ● Others ○ Eigen - C++ template library for LA ○ OpenBLAS, NNPack, MKL, etc ○ OpenCV - Open source CV library ■ Have Halide based DNN module ○ Xilinx DNN and GEMM libraries
  • 8. ENGINEERS AND DEVICES WORKING TOGETHER Frameworks Vendor Languages CUDA HIP OpenCL ACL NPE[1] Caffe BVLC C++, Python, Matlab Yes Yes [2] Yes [2] Yes [2] Yes Caffe2 Facebook C++, Python Yes Yes [2] No No Yes TensorFlow Google Python, C++, JAVA, Go, etc Yes No [3] Yes [2] No Yes MXNet Apache Python, R, C++, Perl, etc Yes Yes [2] No Yes [2] No Torch Community Lua, Python, Matlab Yes Yes [2] Yes [2] No No Theano Montreal Python Yes No Yes [4] No No CNTK Microsoft Python, C++, C#, .Net, etc Yes No [3] No No No Paddle Baidu Python, Go Yes No No No No [1] The trained models are converted to DLC format for execution [2] Out-of-tree support, by vendors and community [3] Under development [4] Seems incomplete and buggy
  • 9. ENGINEERS AND DEVICES WORKING TOGETHER Benchmark and Testing ● Collective Knowledge SDK for AI ○ Automate benchmarking, optimization, co-design the whole deep learning stack to satisfy different requirements (e.g, execution time, accuracy, power consumption, memory usage, etc) ○ Have portable and customizable workflow framework ○ CK modules with open & unified JSON APIs are used to abstract access to changing SW & HW ○ Image classification and compiler flag prediction are supported now ● DeepBench from Baidu ○ Benchmark basic low-level operations on different hardwares for deep learning ● Convnet-benchmarks
  • 10. ENGINEERS AND DEVICES WORKING TOGETHER Gaps and Challenges - Arch Agnostic ● Vendor specific APIs are used for heterogeneous accelerators ○ Nvidia CUDA, AMD HIP, Google TPU StreamExecutor, and etc ○ Can OpenCL or something else be a competitive solution to rule the other world? ● OpenCL support is quited limited so far ○ There is no official OpenCL acceleration support by default for almost all major DL frameworks, although some of them have out-of-tree experimental support. ■ Can we get OpenCL support in upstream DL frameworks to reduce maintain effort? ■ Shall we enable and optimize OpenCL support for more DL frameworks? ○ Can Coriander be a solution to use existing CUDA backend of DL frameworks on OpenCL? ■ AMD HIP has similar mechanism to support CUDA on HIP ■ Coriander is still a research project with limited DL framework and HW support
  • 11. ENGINEERS AND DEVICES WORKING TOGETHER Coriander - CUDA on OpenCL http://www.iwocl.org/wp-content/uploads/iwocl2017-hugh-perkins-cuda-cl.pdf Compile time Run time
  • 12. ENGINEERS AND DEVICES WORKING TOGETHER Gaps and Challenges - Arch Agnostic (Cont.) ● The DL frameworks are fragmented ○ It is not easy to add new accelerator support, and it has to be done for each framework ■ Unified graph IR and tensor IR support for DL frameworks will be helpful ● See next page for several major graph IR implementations ○ There is no good interoperability between different DL frameworks ■ Operator and tensor sharing ● DLPack proposal seems promising, adopted by MXNet, PyTorch, Caffe2, tiny-dnn ■ Model translations between different DL frameworks ○ Applications need unified APIs for portability and flexibility ■ Keras: a Python DL library supporting TensorFlow, CNTK, Theano and MXNet backends ■ TensorFlow Lite and Neural Network API support on Android ● To support pre-trained models with other DL frameworks, translation will be required
  • 13. ENGINEERS AND DEVICES WORKING TOGETHER Computation Graph IR & Tensor IR ● Google XLA ○ A domain-specific compiler, which makes it easier to add custom backends, LLVM backends can be reused ○ JIT to analyse TF graph, fuse operators at runtime ○ AOT to support TF models on memory restricted devices - helpful for ARM devices ● MXNet NNVM & TVM ○ NNVM-Fusion plugin for GPU kernel optimization (e.g, ops fusion, runtime compilation) ○ TinyFlow: TF Front-End + NNVM + Torch7 Back-End ○ TVM: a tensor IR/DSL stack ■ It is DLpack compatible, and reuses Halide IR ● Intel/Nervana NGraph ○ Neon is supported, integrations with TF XLA and Caffe are ongoing ● ONNX (Open Neural Network Exchange) ○ Initiated by Facebook and Microsoft ○ Caffe2, PyTorch and CNTK will support it Who will become the ‘LLVM’ for DL?
  • 14. ENGINEERS AND DEVICES WORKING TOGETHER Gaps and Challenges - ARM Specific ● There are seldom ARM boards with good PCIe capability in the community ○ AMD ROCm requires PCIe Gen3 x8 or x16 with PCIe Atomics ○ High-end FPGA cards requires PCIe Gen3 x16 or Gen4 x8 (e.g, Virtex Ultrascale+ from Xilinx) ● Discrete graphics driver support on ARM platforms is quite limited ○ Nvidia Proprietary Linux GPU driver only supports Aarch32 ○ AMD ROCm has support for Cavium ThunderX, but no generic ARM support ○ GPU virtualization support is also needed by ARM data centers ● There is no good OpenCL driver distribution for mobile GPUs ○ OpenCL support for 96Boards needs improving ● Optimizations for edge devices ○ Memory footprint optimization for DL frameworks ■ e.g, quantization and low precision inference support being done by Google for TensorFlow ○ Power efficiency with different accelerators ● Anything else?
  • 16. ENGINEERS AND DEVICES WORKING TOGETHER 96Boards with OpenCL support 96Boards Accelerators OpenCL version Driver Public Availability Qualcomm DB410c Adreno 306, Hexagon QDSP6 V5 OpenCL 1.1 EP Yes for Android? Bubblegum-96(*) PowerVR G6230 OpenCL 1.2 EP Yes for Linux Mediatek X20 Mali T880 MP4 OpenCL 1.2 TBD Hisilicon Hikey960 Mali G71 MP8 OpenCL 2.0 Ongoing for Android Hisilicon Poplar Mali T720 OpenCL 1.2 TBD Qualcomm SD600eval Adreno 320, Hexagon QDSP6 V4 OpenCL 1.1 TBD Socionext F-Cue Mali T624 OpenCL 1.1 TBD Mavell Andromeda Vivante GC7000UL OpenCL 1.2 TBD MStar Kava Mali T820 OpenCL 1.2 TBD Altera Chameleon96 Cyclone V SoC FPGA OpenCL 1.0 EP TBD * ACL cannot be used on Bubblegum-96 for missing required OpenCL extensions
  • 17. Thank You #SFO17 SFO17 keynotes and videos on: connect.linaro.org For further information: www.linaro.org