SlideShare a Scribd company logo
1 of 15
©2018 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications are subject
to change without notice. All information is provided on an “AS IS” basis without warranties of any kind.
Statements regarding products, including regarding their features, availability, functionality, or
compatibility, are provided for informational purposes only and do not modify the warranty, if any,
applicable to any product. Drawings may not be to scale. Micron, the Micron logo, and all other Micron
trademarks are the property of Micron Technology, Inc. All other trademarks are the property of their
respective owners.
Seamless Prediction at the Edge
Using TensorFlow on FPGAs
©2018 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications are subject
to change without notice. All information is provided on an “AS IS” basis without warranties of any kind.
Statements regarding products, including regarding their features, availability, functionality, or
compatibility, are provided for informational purposes only and do not modify the warranty, if any,
applicable to any product. Drawings may not be to scale. Micron, the Micron logo, and all other Micron
trademarks are the property of Micron Technology, Inc. All other trademarks are the property of their
respective owners.
Brad Spiers, Principal Solutions Architect
Linley Spring Processor Conference: April 12, 2018
Prediction.. At the
Edge
 Limited Weight, Space and Power
 Very Limited External Bandwidth
 Cannot Move Data  Must Compute Locally
 FPGAs Have Speed, Efficiency & Memory Capability
 Now Program FPGAs – with No Code Change!
Micron Confidential2
What are Field Programmable
Gate Arrays (FPGAs)?
3
 Unlike a CPU, no Pre-Defined Instructions
 Can be Dynamically Reprogrammed
 Massive Inherent Parallelism
ALU
ALU
ALU
ALU
Control
Cache
CPU
GPU
FPGA
Current Customer Challenges
4
 Person and Face Recognition
 Body Pose Recognition
 Fingerprint Recognition
 Voice and Speaker Identification
 Object Categorization
 Time-Series Pattern Recognition (LSTM-based RNN’s)
FWDNXT Performance on FPGAs
5
From Just 24 Watts to Handle Power Constraints on “The Edge”
FWDNXT’s Approach
6
 Speed up Traces, not Layers
 Key Idea: Hide non-essential Work Behind
Long Traces
 Traces Stretch
Across
Network Layers
 With Long Traces, Bandwidth Becomes Key
FWDNXT Has a Hierarchical Architecture
7
 Hierarchical Memory
Design Achieves
Efficiency
 Hidden, Long
Memory Fetches Fill
Buffers
 Full Buffers Feed
Compute Units
Micron Hybrid Memory Cube
June 8, 20188
Low-Power Bandwidth to Feed Long Traces
8.5x
more
bandwidth
than DDR4
70% less
energy
per bit
How?
 Stacked DRAM
 Multiple “banks” per layer
 “Light up” smaller bank  less energy
Problem: How to Program FPGAs?
9
 Programming has Been a Barrier in the Past
− Verilog, HDL --> Months to Deploy
 FWDNXT’s Snowflake Compiler & Micron FPGA Modules: ML for IoT
Your Network
Your
Framework
Network
Description
Snowflake
Compiler
Micron FPGA
Module
Machine Learning
At the Edge
What Model Types Can FWDNXT Handle?
10
 Any Model
− CNN
− RNN
− LSTM
− …
 Any Framework
− PYTORCH
− Caffe
− TensorFlow
− …
FWDNXT Representations
11
 Now, 16 bit Fixed Point Used for
Inputs
 Fixed Point: 5 bit integer, 11-bit
fraction
 Moving to 16 bit Floating Point
 Now, 32-bit Fixed Point Used for
Multiplication Output and Add’s
Fixed Point Representation
Steps to Deploy Models on FPGAs
12
1. Define Model in PYTORCH, Caffe
or Tensorflow
2. Train Model with Data on GPUs
3. Input Framework-Trained Model
into SnowFlake Compiler
4. Deploy Snowflake Output Directly
onto Micron FPGA Module
NO CODE CHANGE
Hybrid Memory
Cube
Up to 512GB
DDR Footprints
Advanced
FPGAs
 Xilinx UltraScale +
 Intel Stratix 10
What New Problems Can We Solve?
Micron Confidential13
 Some Domains Have Problems that Require
Larger Memory Footprints
− Medical Imaging
− Oil Exploration
− Videos
− Government
 Need both High-Bandwidth and High-
Capacity Memory
 Micron FPGA Cards Plus FWDNXT
Snowflake Compiler Provide Missing Links
Summary
Micron Confidential14
 The Edge Poses Challenges in Power and Bandwidth
 FPGAs Can Help, but Programming Was a Challenge—Until Now
 Memory Bandwidth now Key to Machine Learning Performance
 Plus, Solve Larger Problems on Boards with up to 512GB of Memory
www.micron.com/tensorflow
Micron Confidential15

More Related Content

Similar to Micron: Seamless Prediction at the Edge Using TensorFlow on FPGAs

AI-INSPIRED IOT CHIPLETS AND 3D HETEROGENEOUS INTEGRATION.pdf
AI-INSPIRED IOT CHIPLETS AND 3D HETEROGENEOUS INTEGRATION.pdfAI-INSPIRED IOT CHIPLETS AND 3D HETEROGENEOUS INTEGRATION.pdf
AI-INSPIRED IOT CHIPLETS AND 3D HETEROGENEOUS INTEGRATION.pdfObject Automation
 
Innodisk Selection Guide (2019 Edition)
Innodisk Selection Guide (2019 Edition)Innodisk Selection Guide (2019 Edition)
Innodisk Selection Guide (2019 Edition)Innodisk Corporation
 
#IBMEdge: Flash Storage Session
#IBMEdge: Flash Storage Session#IBMEdge: Flash Storage Session
#IBMEdge: Flash Storage SessionBrocade
 
Konsolidace Oracle DB na systémech s procesory M7
Konsolidace Oracle DB na systémech s procesory M7Konsolidace Oracle DB na systémech s procesory M7
Konsolidace Oracle DB na systémech s procesory M7MarketingArrowECS_CZ
 
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...Intel® Software
 
AI Crash Course- Supercomputing
AI Crash Course- SupercomputingAI Crash Course- Supercomputing
AI Crash Course- SupercomputingIntel IT Center
 
In The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIn The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIntel® Software
 
Emebedded Memories from GF pb-emem presentation
Emebedded Memories from GF pb-emem presentationEmebedded Memories from GF pb-emem presentation
Emebedded Memories from GF pb-emem presentationsampige
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production EnvironmentsIntel® Software
 
Fujitsu m10 server features and capabilities
Fujitsu m10 server features and capabilitiesFujitsu m10 server features and capabilities
Fujitsu m10 server features and capabilitiessolarisyougood
 
Deep learning for FinTech
Deep learning for FinTechDeep learning for FinTech
Deep learning for FinTechgeetachauhan
 
GTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWERGTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWERAchronix
 
Ferri Embedded Storage
Ferri Embedded Storage Ferri Embedded Storage
Ferri Embedded Storage Silicon Motion
 
Future Cloud Infrastructure
Future Cloud InfrastructureFuture Cloud Infrastructure
Future Cloud Infrastructureexponential-inc
 
08 Supercomputer Fugaku
08 Supercomputer Fugaku08 Supercomputer Fugaku
08 Supercomputer FugakuRCCSRENKEI
 
Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Day Beijing - Storage Modernization with Intel & Ceph Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Day Beijing - Storage Modernization with Intel & Ceph Ceph Community
 
Ceph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and CephCeph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and CephDanielle Womboldt
 
Cisco connect montreal 2018 compute v final
Cisco connect montreal 2018   compute v finalCisco connect montreal 2018   compute v final
Cisco connect montreal 2018 compute v finalCisco Canada
 

Similar to Micron: Seamless Prediction at the Edge Using TensorFlow on FPGAs (20)

AI-INSPIRED IOT CHIPLETS AND 3D HETEROGENEOUS INTEGRATION.pdf
AI-INSPIRED IOT CHIPLETS AND 3D HETEROGENEOUS INTEGRATION.pdfAI-INSPIRED IOT CHIPLETS AND 3D HETEROGENEOUS INTEGRATION.pdf
AI-INSPIRED IOT CHIPLETS AND 3D HETEROGENEOUS INTEGRATION.pdf
 
Innodisk Selection Guide (2019 Edition)
Innodisk Selection Guide (2019 Edition)Innodisk Selection Guide (2019 Edition)
Innodisk Selection Guide (2019 Edition)
 
#IBMEdge: Flash Storage Session
#IBMEdge: Flash Storage Session#IBMEdge: Flash Storage Session
#IBMEdge: Flash Storage Session
 
Konsolidace Oracle DB na systémech s procesory M7
Konsolidace Oracle DB na systémech s procesory M7Konsolidace Oracle DB na systémech s procesory M7
Konsolidace Oracle DB na systémech s procesory M7
 
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
 
AI Crash Course- Supercomputing
AI Crash Course- SupercomputingAI Crash Course- Supercomputing
AI Crash Course- Supercomputing
 
In The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIn The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for Intel
 
Emebedded Memories from GF pb-emem presentation
Emebedded Memories from GF pb-emem presentationEmebedded Memories from GF pb-emem presentation
Emebedded Memories from GF pb-emem presentation
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production Environments
 
Intel python 2017
Intel python 2017Intel python 2017
Intel python 2017
 
5G Network Introduction
5G Network Introduction5G Network Introduction
5G Network Introduction
 
Fujitsu m10 server features and capabilities
Fujitsu m10 server features and capabilitiesFujitsu m10 server features and capabilities
Fujitsu m10 server features and capabilities
 
Deep learning for FinTech
Deep learning for FinTechDeep learning for FinTech
Deep learning for FinTech
 
GTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWERGTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWER
 
Ferri Embedded Storage
Ferri Embedded Storage Ferri Embedded Storage
Ferri Embedded Storage
 
Future Cloud Infrastructure
Future Cloud InfrastructureFuture Cloud Infrastructure
Future Cloud Infrastructure
 
08 Supercomputer Fugaku
08 Supercomputer Fugaku08 Supercomputer Fugaku
08 Supercomputer Fugaku
 
Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Day Beijing - Storage Modernization with Intel & Ceph Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Day Beijing - Storage Modernization with Intel & Ceph
 
Ceph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and CephCeph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and Ceph
 
Cisco connect montreal 2018 compute v final
Cisco connect montreal 2018   compute v finalCisco connect montreal 2018   compute v final
Cisco connect montreal 2018 compute v final
 

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

Micron: Seamless Prediction at the Edge Using TensorFlow on FPGAs

  • 1. ©2018 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications are subject to change without notice. All information is provided on an “AS IS” basis without warranties of any kind. Statements regarding products, including regarding their features, availability, functionality, or compatibility, are provided for informational purposes only and do not modify the warranty, if any, applicable to any product. Drawings may not be to scale. Micron, the Micron logo, and all other Micron trademarks are the property of Micron Technology, Inc. All other trademarks are the property of their respective owners. Seamless Prediction at the Edge Using TensorFlow on FPGAs ©2018 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications are subject to change without notice. All information is provided on an “AS IS” basis without warranties of any kind. Statements regarding products, including regarding their features, availability, functionality, or compatibility, are provided for informational purposes only and do not modify the warranty, if any, applicable to any product. Drawings may not be to scale. Micron, the Micron logo, and all other Micron trademarks are the property of Micron Technology, Inc. All other trademarks are the property of their respective owners. Brad Spiers, Principal Solutions Architect Linley Spring Processor Conference: April 12, 2018
  • 2. Prediction.. At the Edge  Limited Weight, Space and Power  Very Limited External Bandwidth  Cannot Move Data  Must Compute Locally  FPGAs Have Speed, Efficiency & Memory Capability  Now Program FPGAs – with No Code Change! Micron Confidential2
  • 3. What are Field Programmable Gate Arrays (FPGAs)? 3  Unlike a CPU, no Pre-Defined Instructions  Can be Dynamically Reprogrammed  Massive Inherent Parallelism ALU ALU ALU ALU Control Cache CPU GPU FPGA
  • 4. Current Customer Challenges 4  Person and Face Recognition  Body Pose Recognition  Fingerprint Recognition  Voice and Speaker Identification  Object Categorization  Time-Series Pattern Recognition (LSTM-based RNN’s)
  • 5. FWDNXT Performance on FPGAs 5 From Just 24 Watts to Handle Power Constraints on “The Edge”
  • 6. FWDNXT’s Approach 6  Speed up Traces, not Layers  Key Idea: Hide non-essential Work Behind Long Traces  Traces Stretch Across Network Layers  With Long Traces, Bandwidth Becomes Key
  • 7. FWDNXT Has a Hierarchical Architecture 7  Hierarchical Memory Design Achieves Efficiency  Hidden, Long Memory Fetches Fill Buffers  Full Buffers Feed Compute Units
  • 8. Micron Hybrid Memory Cube June 8, 20188 Low-Power Bandwidth to Feed Long Traces 8.5x more bandwidth than DDR4 70% less energy per bit How?  Stacked DRAM  Multiple “banks” per layer  “Light up” smaller bank  less energy
  • 9. Problem: How to Program FPGAs? 9  Programming has Been a Barrier in the Past − Verilog, HDL --> Months to Deploy  FWDNXT’s Snowflake Compiler & Micron FPGA Modules: ML for IoT Your Network Your Framework Network Description Snowflake Compiler Micron FPGA Module Machine Learning At the Edge
  • 10. What Model Types Can FWDNXT Handle? 10  Any Model − CNN − RNN − LSTM − …  Any Framework − PYTORCH − Caffe − TensorFlow − …
  • 11. FWDNXT Representations 11  Now, 16 bit Fixed Point Used for Inputs  Fixed Point: 5 bit integer, 11-bit fraction  Moving to 16 bit Floating Point  Now, 32-bit Fixed Point Used for Multiplication Output and Add’s Fixed Point Representation
  • 12. Steps to Deploy Models on FPGAs 12 1. Define Model in PYTORCH, Caffe or Tensorflow 2. Train Model with Data on GPUs 3. Input Framework-Trained Model into SnowFlake Compiler 4. Deploy Snowflake Output Directly onto Micron FPGA Module NO CODE CHANGE
  • 13. Hybrid Memory Cube Up to 512GB DDR Footprints Advanced FPGAs  Xilinx UltraScale +  Intel Stratix 10 What New Problems Can We Solve? Micron Confidential13  Some Domains Have Problems that Require Larger Memory Footprints − Medical Imaging − Oil Exploration − Videos − Government  Need both High-Bandwidth and High- Capacity Memory  Micron FPGA Cards Plus FWDNXT Snowflake Compiler Provide Missing Links
  • 14. Summary Micron Confidential14  The Edge Poses Challenges in Power and Bandwidth  FPGAs Can Help, but Programming Was a Challenge—Until Now  Memory Bandwidth now Key to Machine Learning Performance  Plus, Solve Larger Problems on Boards with up to 512GB of Memory www.micron.com/tensorflow