Micron: Seamless Prediction at the Edge Using TensorFlow on FPGAs

©2018 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications are subject
to change without notice. All information is provided on an “AS IS” basis without warranties of any kind.
Statements regarding products, including regarding their features, availability, functionality, or
compatibility, are provided for informational purposes only and do not modify the warranty, if any,
applicable to any product. Drawings may not be to scale. Micron, the Micron logo, and all other Micron
trademarks are the property of Micron Technology, Inc. All other trademarks are the property of their
respective owners.
Seamless Prediction at the Edge
Using TensorFlow on FPGAs
©2018 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications are subject
to change without notice. All information is provided on an “AS IS” basis without warranties of any kind.
Statements regarding products, including regarding their features, availability, functionality, or
compatibility, are provided for informational purposes only and do not modify the warranty, if any,
applicable to any product. Drawings may not be to scale. Micron, the Micron logo, and all other Micron
trademarks are the property of Micron Technology, Inc. All other trademarks are the property of their
respective owners.
Brad Spiers, Principal Solutions Architect
Linley Spring Processor Conference: April 12, 2018

Prediction.. At the
Edge
 Limited Weight, Space and Power
 Very Limited External Bandwidth
 Cannot Move Data  Must Compute Locally
 FPGAs Have Speed, Efficiency & Memory Capability
 Now Program FPGAs – with No Code Change!
Micron Confidential2

What are Field Programmable
Gate Arrays (FPGAs)?
3
 Unlike a CPU, no Pre-Defined Instructions
 Can be Dynamically Reprogrammed
 Massive Inherent Parallelism
ALU
ALU
ALU
ALU
Control
Cache
CPU
GPU
FPGA

Current Customer Challenges
4
 Person and Face Recognition
 Body Pose Recognition
 Fingerprint Recognition
 Voice and Speaker Identification
 Object Categorization
 Time-Series Pattern Recognition (LSTM-based RNN’s)

FWDNXT Performance on FPGAs
5
From Just 24 Watts to Handle Power Constraints on “The Edge”

FWDNXT’s Approach
6
 Speed up Traces, not Layers
 Key Idea: Hide non-essential Work Behind
Long Traces
 Traces Stretch
Across
Network Layers
 With Long Traces, Bandwidth Becomes Key

FWDNXT Has a Hierarchical Architecture
7
 Hierarchical Memory
Design Achieves
Efficiency
 Hidden, Long
Memory Fetches Fill
Buffers
 Full Buffers Feed
Compute Units

Micron Hybrid Memory Cube
June 8, 20188
Low-Power Bandwidth to Feed Long Traces
8.5x
more
bandwidth
than DDR4
70% less
energy
per bit
How?
 Stacked DRAM
 Multiple “banks” per layer
 “Light up” smaller bank  less energy

Problem: How to Program FPGAs?
9
 Programming has Been a Barrier in the Past
− Verilog, HDL --> Months to Deploy
 FWDNXT’s Snowflake Compiler & Micron FPGA Modules: ML for IoT
Your Network
Your
Framework
Network
Description
Snowflake
Compiler
Micron FPGA
Module
Machine Learning
At the Edge

What Model Types Can FWDNXT Handle?
10
 Any Model
− CNN
− RNN
− LSTM
− …
 Any Framework
− PYTORCH
− Caffe
− TensorFlow
− …

FWDNXT Representations
11
 Now, 16 bit Fixed Point Used for
Inputs
 Fixed Point: 5 bit integer, 11-bit
fraction
 Moving to 16 bit Floating Point
 Now, 32-bit Fixed Point Used for
Multiplication Output and Add’s
Fixed Point Representation

Steps to Deploy Models on FPGAs
12
1. Define Model in PYTORCH, Caffe
or Tensorflow
2. Train Model with Data on GPUs
3. Input Framework-Trained Model
into SnowFlake Compiler
4. Deploy Snowflake Output Directly
onto Micron FPGA Module
NO CODE CHANGE

Hybrid Memory
Cube
Up to 512GB
DDR Footprints
Advanced
FPGAs
 Xilinx UltraScale +
 Intel Stratix 10
What New Problems Can We Solve?
 Some Domains Have Problems that Require
Larger Memory Footprints
− Medical Imaging
− Oil Exploration
− Videos
− Government
 Need both High-Bandwidth and High-
Capacity Memory
 Micron FPGA Cards Plus FWDNXT
Snowflake Compiler Provide Missing Links

Summary
 The Edge Poses Challenges in Power and Bandwidth
 FPGAs Can Help, but Programming Was a Challenge—Until Now
 Memory Bandwidth now Key to Machine Learning Performance
 Plus, Solve Larger Problems on Boards with up to 512GB of Memory
www.micron.com/tensorflow

Micron: Seamless Prediction at the Edge Using TensorFlow on FPGAs

Recommended

Recommended

More Related Content

Similar to Micron: Seamless Prediction at the Edge Using TensorFlow on FPGAs

Similar to Micron: Seamless Prediction at the Edge Using TensorFlow on FPGAs (20)

Recently uploaded

Recently uploaded (20)

Micron: Seamless Prediction at the Edge Using TensorFlow on FPGAs