IoT is exploding, with 50 billion sensors predicted by 2020. These sensors create daunting data streams. To businesses, though, the real value comes from learning what patterns lye hidden in these streams. At the edge, power constraints suggest using technologies like FPGAs. However, they have been notoriously difficult to program—until now. Come hear how Micron can deploy deep-learning models created with Tensorflow—with no code change—on power-efficient Micron FPGA systems.
2. Prediction.. At the
Edge
Limited Weight, Space and Power
Very Limited External Bandwidth
Cannot Move Data Must Compute Locally
FPGAs Have Speed, Efficiency & Memory Capability
Now Program FPGAs – with No Code Change!
Micron Confidential2
3. What are Field Programmable
Gate Arrays (FPGAs)?
3
Unlike a CPU, no Pre-Defined Instructions
Can be Dynamically Reprogrammed
Massive Inherent Parallelism
ALU
ALU
ALU
ALU
Control
Cache
CPU
GPU
FPGA
4. Current Customer Challenges
4
Person and Face Recognition
Body Pose Recognition
Fingerprint Recognition
Voice and Speaker Identification
Object Categorization
Time-Series Pattern Recognition (LSTM-based RNN’s)
5. FWDNXT Performance on FPGAs
5
From Just 24 Watts to Handle Power Constraints on “The Edge”
6. FWDNXT’s Approach
6
Speed up Traces, not Layers
Key Idea: Hide non-essential Work Behind
Long Traces
Traces Stretch
Across
Network Layers
With Long Traces, Bandwidth Becomes Key
7. FWDNXT Has a Hierarchical Architecture
7
Hierarchical Memory
Design Achieves
Efficiency
Hidden, Long
Memory Fetches Fill
Buffers
Full Buffers Feed
Compute Units
8. Micron Hybrid Memory Cube
June 8, 20188
Low-Power Bandwidth to Feed Long Traces
8.5x
more
bandwidth
than DDR4
70% less
energy
per bit
How?
Stacked DRAM
Multiple “banks” per layer
“Light up” smaller bank less energy
9. Problem: How to Program FPGAs?
9
Programming has Been a Barrier in the Past
− Verilog, HDL --> Months to Deploy
FWDNXT’s Snowflake Compiler & Micron FPGA Modules: ML for IoT
Your Network
Your
Framework
Network
Description
Snowflake
Compiler
Micron FPGA
Module
Machine Learning
At the Edge
10. What Model Types Can FWDNXT Handle?
10
Any Model
− CNN
− RNN
− LSTM
− …
Any Framework
− PYTORCH
− Caffe
− TensorFlow
− …
11. FWDNXT Representations
11
Now, 16 bit Fixed Point Used for
Inputs
Fixed Point: 5 bit integer, 11-bit
fraction
Moving to 16 bit Floating Point
Now, 32-bit Fixed Point Used for
Multiplication Output and Add’s
Fixed Point Representation
12. Steps to Deploy Models on FPGAs
12
1. Define Model in PYTORCH, Caffe
or Tensorflow
2. Train Model with Data on GPUs
3. Input Framework-Trained Model
into SnowFlake Compiler
4. Deploy Snowflake Output Directly
onto Micron FPGA Module
NO CODE CHANGE
13. Hybrid Memory
Cube
Up to 512GB
DDR Footprints
Advanced
FPGAs
Xilinx UltraScale +
Intel Stratix 10
What New Problems Can We Solve?
Micron Confidential13
Some Domains Have Problems that Require
Larger Memory Footprints
− Medical Imaging
− Oil Exploration
− Videos
− Government
Need both High-Bandwidth and High-
Capacity Memory
Micron FPGA Cards Plus FWDNXT
Snowflake Compiler Provide Missing Links
14. Summary
Micron Confidential14
The Edge Poses Challenges in Power and Bandwidth
FPGAs Can Help, but Programming Was a Challenge—Until Now
Memory Bandwidth now Key to Machine Learning Performance
Plus, Solve Larger Problems on Boards with up to 512GB of Memory
www.micron.com/tensorflow