SlideShare a Scribd company logo
1 of 24
Download to read offline
© Copyright 2019 Xilinx
Ashish Sirasao
Fellow, Accelerated Computing
ashish.sirasao@xilinx.com
Xilinx Inference Solution
for Deep Learning
© Copyright 2019 Xilinx
Deep Learning Models – A broad spectrum
• Feature Extraction
• Object Detection
• Image Segmentation
Convolutional Neural Network
• Sequence and Temporal Data
• Speech to Text
• Language Translation
Recurrent Neural Network
• Classification
• Universal Function Approximator
• Autoencoder
Multi-Layer Perceptron
Object Detection SegmentationClassification
“Dog”
Page 2
© Copyright 2019 Xilinx
Xilinx – Focus on Inference
Page 3
© Copyright 2019 Xilinx
Deep learning resurgence - Till 2015
LeNet-5: 1998 AlexNet: 2012
VGG-Net: 2014 ResNet: 2015GoogLeNet: 2014
>> 4
© Copyright 2019 Xilinx
Rapid Algorithmic Changes – 2015 - 2018
>> 5
© Copyright 2019 Xilinx
Deep Learning on Xilinx Adaptable Devices
>> 6
• 2D Array of MACs
• Flexible on-chip memory access
• High Bandwidth, Multiple Access Ports
Data Parallel
• Near Memory Compute
• Programmable routing for data & filter reuse
Custom Memory
Hierarchy
• Flexible Data Types
• FP32/16, INT16/8/4/2, Binary/Ternary
• Sparsity friendly compute
Compression &
Sparsity
• Scalable device family for different applications
• Built in System functions – Networking, Video, ARM
Broad Device
Range
© Copyright 2019 Xilinx
ALVEO Data Center Workloads
>> 7
*GoogleNet v1
https://www.xilinx.com/products/boards-and-kits/alveo.html
© Copyright 2019 Xilinx
Variable Precision Compute Density – TOPs on VU9P
>> 8
Weight/Activation VU9P
MAX FLOAT (SP)/ FLOAT (SP) 2.18538
MAX FP16/FP16 5.64
MAX 8b/8b 17.48304
XFP8 (1,3,4) 23.60306
MAX 4b/4b - DSP 25.71035
XFP7 (1,3,3) 34.96608
MAX 4b/8b 41.72917
MAX 4b/4b - LUT 81.65384
MAX 2b/8b 92.10951
MAX T/8b 92.10951
MAX B/8b 92.10951
MAX B/4b 160.7017
MAX 2b/2b 314.7075
MAX B/B 686.6345
1 10 100 1000
MAX FLOAT (SP)/ FLOAT (SP)
MAX FP16/FP16
MAX 8b/8b
XFP8 (1,3,4)
MAX 4b/4b - DSP
XFP7 (1,3,3)
MAX 4b/8b
MAX 4b/4b - LUT
MAX 2b/8b
MAX T/8b
MAX B/8b
MAX B/4b
MAX 2b/2b
MAX B/B
MAX TOPs (Log Scale)
VU9P
MAX TOPs Estimates at 700 MHz FMAX
© Copyright 2019 Xilinx
>> 9
© Copyright 2019 Xilinx
Customized overlays with ISA architecture for optimized implementation
Easy plug and play with Software Stack
Overlay Architecture
Custom Processors Exploiting Xilinx FPGA Flexibility
MLP Engine
Scalable sparse and dense
implementation*
xDNN – CNN Engine for Large 16 nm
Xilinx Devices**
Deephi DPU – Flexible CNN Engine
with Embedded Focus
CHaiDNN – HLS based open source
offering***
Deephi ESE
LSTM Speech to Text
engine
Random Forest
Configurable RF
classification
*https://github.com/Xilinx/gemx
** https://github.com/Xilinx/ml-suite
*** https://github.com/Xilinx/CHaiDNN
© Copyright 2019 Xilinx
Inference Optimization Techniques
Hotchips 2018 Tutorial – Michaela Blott, Xilinx Inc
>> 11
© Copyright 2019 Xilinx
Model Pruning and Integer Arithmetic - Mainstream
• RNN Models – 5x to 20x
• CNN Models – 30% to 10x
Model Compression provides compute abd memory gains
• 8 bit solution loses no significant accuracy
• BNNs are improving rapidly
• Near consensus that inference can be very low precision
• Image / CNN: 2-bit (binary)
• Speech / RNN: 3-bit (ternary)
Increasing Accuracy of Reduced Precision CNNs & BNNs
© Copyright 2019 Xilinx
Whole Application Acceleration
Example - Smart City/Surveillance
Efficient AI Deployment Requires Full Application Optimization
© Copyright 2019 Xilinx
Xilinx AI Development Stack – Edge to Cloud
>> 14
Edge/Embedded Cloud/DC
Platforms Z7020 Board Z7020 SOM ZU2/3 SOM ZU2/3 Card
ZU9 Card ZCU102 ZCU104 Ultra96
Xilinx U200, U250
FPGA IP Deephi DPU xDNN v2 and v3
Deephi Runtime
Software Stack
xfDNN Runtime
Deephi Compiler xfDNN Compiler
Pruning and Quantization(Caffe and Tensorflow EA)
Models 20+ pruned / customized / basic models
Deephi LSTM
SDSoC SDAccel
© Copyright 2019 Xilinx
>> 15
Xilinx Tensor Processor: An Inference Engine,
Network Compiler + Runtime for Xilinx FPGAs
© Copyright 2019 Xilinx
>> 16
© Copyright 2019 Xilinx
xDNN Performance Comparison – Batch of 1
>> 17
https://www.xilinx.com/support/documentation/white_papers/wp504-accel-dnns.pdf
© Copyright 2019 Xilinx
https://github.com/Xilinx/ml-suite
Server Platforms
Intel x86, AMD Epyc,
Power9, ARM
FaaS
AWS F1, Nimbix,
Ali Cloud, Huawei
Xilinx SDx Boards
ALVEO U200
ALVEO U250
ALVEO 280
>> 18
© Copyright 2019 Xilinx
Python interface to simplify xdnn usage
Blocking
Non-Blocking 8-FPGA
© Copyright 2019 XilinxAMD / XILINX CONFIDENTIAL
Demo successfully shown at Xilinx Developer Forum
Most photographed demo
GoogLeNet Performance
30,000 images/sec, Int8, Batch 1, XDNN v3
Final softmax and FC layers running on AMD CPU overlapped with FPGA
using optimized OpenBLAS
Single U250 performance with XDNN v3 for GoogLeNet
Massive Scaleout
EPYC BOXX + 8 ALVEO U250
0
5000
10000
15000
20000
25000
30000
35000
Single FPGA 8 FPGAs
GoogLeNet Performance
(img/sec) Int8, Batch 1
XDNN v2 XDNN v3
PL Kernel Peak TOP/s
(Int8)
Latency
(ms)
Images/sec
4 Kernels--Throughput 19.088 1.82 4127
4 kernels – Low
Latency
19.088 1.18 3389
© Copyright 2019 Xilinx
Ready to use Algorithms – Evaluate Baseline Models
Face
Object Detection,
Landmarks,
Recognition and Anti-
spoofing
People
Object Detection, Pose
estimation,
Re-identification
Video Analytics
Object Detection, Multi-
object tracking
Attribute – Person, Car,
Text – Plate number
Segmentation
Scene parsing,
lane detection
Medical Imaging
Cervical cancer classification,
guide-wire detection, cell
segmentation
Satellite Imaging
Object detection,
Accelerated pre and post
processing>> 21
© Copyright 2019 Xilinx
Model Compression – Enabling Next Level of Performance
Classification Networks
Baseline Pruning Result 1 Pruning Result 2
Top-5 Top-5 ΔTop5 ratio Top-5 ΔTop5 ratio
Resnet50 [7.7G] 91.65% 91.23% -0.42% 40% 90.79% -0.86% 32%
Inception_v2 [4.0G] 91.07% 90.37% -0.70% 60% 90.07% -1.00% 55%
SqueezeNet [778M] 83.19% 82.46% -0.73% 89% 81.57% -1.62% 75%
Detection Networks
Baseline
mAP
Pruning Result 1 Pruning Result 2
mAP ΔmAP ratio mAP ΔmAP ratio
DetectNet [17.5G] 44.46 45.7 +1.24 63% 45.12 +0.66 50%
SSD+VGG [ 117G] 61.5 62.0 +0.5 16% 60.4 -1.1 10%
[A] SSD+VGG [ 173G] 57.1 58.7 +1.6 40% 56.6 -0.5 12%
[B] Yolov2 [ 198G] 80.4 81.9 +1.5 28% 79.2 -1.2 7%
Segmentation Networks
Baseline Pruning Result 1 Pruning Result 2
mIoU mIoU ΔmIoU ratio mIoU ΔmIoU ratio
FPN [163G] 65.69% 65.21% -0.48% 80% 64.07% -1.62% 60%
© Copyright 2019 Xilinx
Xilinx VERSAL – Breakthrough AI Inference Performance
>> 23
https://www.xilinx.com/products/silicon-devices/acap/versal.html
© Copyright 2019 Xilinx
Adaptable.
Intelligent.

More Related Content

What's hot

"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation..."Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...Edge AI and Vision Alliance
 
"SoCs for Computer Vision-enabled IoT Devices," a March 2019 Silicon Valle...
 	 "SoCs for Computer Vision-enabled IoT Devices," a March 2019 Silicon Valle... 	 "SoCs for Computer Vision-enabled IoT Devices," a March 2019 Silicon Valle...
"SoCs for Computer Vision-enabled IoT Devices," a March 2019 Silicon Valle...Edge AI and Vision Alliance
 
Superfluid networking for 5G: vision and state of the art
Superfluid networking for 5G: vision and state of the artSuperfluid networking for 5G: vision and state of the art
Superfluid networking for 5G: vision and state of the artStefano Salsano
 
"Developing Real-time Video Applications with CoaXPress," A Presentation from...
"Developing Real-time Video Applications with CoaXPress," A Presentation from..."Developing Real-time Video Applications with CoaXPress," A Presentation from...
"Developing Real-time Video Applications with CoaXPress," A Presentation from...Edge AI and Vision Alliance
 
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation..."Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...Edge AI and Vision Alliance
 
"Designing Deep Neural Network Algorithms for Embedded Devices," a Presentati...
"Designing Deep Neural Network Algorithms for Embedded Devices," a Presentati..."Designing Deep Neural Network Algorithms for Embedded Devices," a Presentati...
"Designing Deep Neural Network Algorithms for Embedded Devices," a Presentati...Edge AI and Vision Alliance
 
Open Source Edge Computing Platforms - Overview
Open Source Edge Computing Platforms - OverviewOpen Source Edge Computing Platforms - Overview
Open Source Edge Computing Platforms - OverviewKrishna-Kumar
 
IBM Bluemix Nice meetup #5 - 20170504 - Orchestrer Docker avec Kubernetes
IBM Bluemix Nice meetup #5 - 20170504 - Orchestrer Docker avec KubernetesIBM Bluemix Nice meetup #5 - 20170504 - Orchestrer Docker avec Kubernetes
IBM Bluemix Nice meetup #5 - 20170504 - Orchestrer Docker avec KubernetesIBM France Lab
 
Necos keynote UFRN Telecomday
Necos keynote UFRN TelecomdayNecos keynote UFRN Telecomday
Necos keynote UFRN TelecomdayAugusto Neto
 
Edge Computing: A Unified Infrastructure for all the Different Pieces
Edge Computing: A Unified Infrastructure for all the Different PiecesEdge Computing: A Unified Infrastructure for all the Different Pieces
Edge Computing: A Unified Infrastructure for all the Different PiecesCloudify Community
 
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"IBM France Lab
 
IoT Microservices at the Edge with Eclipse ioFog
IoT Microservices at the Edge with Eclipse ioFogIoT Microservices at the Edge with Eclipse ioFog
IoT Microservices at the Edge with Eclipse ioFogKilton Hopkins
 
“Vitis and Vitis AI: Application Acceleration from Cloud to Edge,” a Presenta...
“Vitis and Vitis AI: Application Acceleration from Cloud to Edge,” a Presenta...“Vitis and Vitis AI: Application Acceleration from Cloud to Edge,” a Presenta...
“Vitis and Vitis AI: Application Acceleration from Cloud to Edge,” a Presenta...Edge AI and Vision Alliance
 
Virtualization and Migration in Cloud - Edge Computing models using OpenStack...
Virtualization and Migration in Cloud - Edge Computing models using OpenStack...Virtualization and Migration in Cloud - Edge Computing models using OpenStack...
Virtualization and Migration in Cloud - Edge Computing models using OpenStack...Sai praveen Seva
 
“Streamlining Development of Edge AI Applications,” a Presentation from NVIDIA
“Streamlining Development of Edge AI Applications,” a Presentation from NVIDIA“Streamlining Development of Edge AI Applications,” a Presentation from NVIDIA
“Streamlining Development of Edge AI Applications,” a Presentation from NVIDIAEdge AI and Vision Alliance
 
CPaaS.io Y1 Review Meeting - Cloud & Edge Programming
CPaaS.io Y1 Review Meeting - Cloud & Edge ProgrammingCPaaS.io Y1 Review Meeting - Cloud & Edge Programming
CPaaS.io Y1 Review Meeting - Cloud & Edge ProgrammingStephan Haller
 
Orchestrate Your AI Workload with Cisco Hyperflex, Powered by NVIDIA GPUs
Orchestrate Your AI Workload with Cisco Hyperflex, Powered by NVIDIA GPUs Orchestrate Your AI Workload with Cisco Hyperflex, Powered by NVIDIA GPUs
Orchestrate Your AI Workload with Cisco Hyperflex, Powered by NVIDIA GPUs Renee Yao
 

What's hot (20)

"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation..."Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
 
"SoCs for Computer Vision-enabled IoT Devices," a March 2019 Silicon Valle...
 	 "SoCs for Computer Vision-enabled IoT Devices," a March 2019 Silicon Valle... 	 "SoCs for Computer Vision-enabled IoT Devices," a March 2019 Silicon Valle...
"SoCs for Computer Vision-enabled IoT Devices," a March 2019 Silicon Valle...
 
Superfluid networking for 5G: vision and state of the art
Superfluid networking for 5G: vision and state of the artSuperfluid networking for 5G: vision and state of the art
Superfluid networking for 5G: vision and state of the art
 
"Developing Real-time Video Applications with CoaXPress," A Presentation from...
"Developing Real-time Video Applications with CoaXPress," A Presentation from..."Developing Real-time Video Applications with CoaXPress," A Presentation from...
"Developing Real-time Video Applications with CoaXPress," A Presentation from...
 
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation..."Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
 
"Designing Deep Neural Network Algorithms for Embedded Devices," a Presentati...
"Designing Deep Neural Network Algorithms for Embedded Devices," a Presentati..."Designing Deep Neural Network Algorithms for Embedded Devices," a Presentati...
"Designing Deep Neural Network Algorithms for Embedded Devices," a Presentati...
 
2016 open-source-network-softwarization
2016 open-source-network-softwarization2016 open-source-network-softwarization
2016 open-source-network-softwarization
 
Feec telecom-nw-softwarization-aug-2015
Feec telecom-nw-softwarization-aug-2015Feec telecom-nw-softwarization-aug-2015
Feec telecom-nw-softwarization-aug-2015
 
Open Source Edge Computing Platforms - Overview
Open Source Edge Computing Platforms - OverviewOpen Source Edge Computing Platforms - Overview
Open Source Edge Computing Platforms - Overview
 
IBM Bluemix Nice meetup #5 - 20170504 - Orchestrer Docker avec Kubernetes
IBM Bluemix Nice meetup #5 - 20170504 - Orchestrer Docker avec KubernetesIBM Bluemix Nice meetup #5 - 20170504 - Orchestrer Docker avec Kubernetes
IBM Bluemix Nice meetup #5 - 20170504 - Orchestrer Docker avec Kubernetes
 
Necos keynote UFRN Telecomday
Necos keynote UFRN TelecomdayNecos keynote UFRN Telecomday
Necos keynote UFRN Telecomday
 
Edge Computing: A Unified Infrastructure for all the Different Pieces
Edge Computing: A Unified Infrastructure for all the Different PiecesEdge Computing: A Unified Infrastructure for all the Different Pieces
Edge Computing: A Unified Infrastructure for all the Different Pieces
 
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
 
IoT Microservices at the Edge with Eclipse ioFog
IoT Microservices at the Edge with Eclipse ioFogIoT Microservices at the Edge with Eclipse ioFog
IoT Microservices at the Edge with Eclipse ioFog
 
“Vitis and Vitis AI: Application Acceleration from Cloud to Edge,” a Presenta...
“Vitis and Vitis AI: Application Acceleration from Cloud to Edge,” a Presenta...“Vitis and Vitis AI: Application Acceleration from Cloud to Edge,” a Presenta...
“Vitis and Vitis AI: Application Acceleration from Cloud to Edge,” a Presenta...
 
Virtualization and Migration in Cloud - Edge Computing models using OpenStack...
Virtualization and Migration in Cloud - Edge Computing models using OpenStack...Virtualization and Migration in Cloud - Edge Computing models using OpenStack...
Virtualization and Migration in Cloud - Edge Computing models using OpenStack...
 
“Streamlining Development of Edge AI Applications,” a Presentation from NVIDIA
“Streamlining Development of Edge AI Applications,” a Presentation from NVIDIA“Streamlining Development of Edge AI Applications,” a Presentation from NVIDIA
“Streamlining Development of Edge AI Applications,” a Presentation from NVIDIA
 
NFV
NFVNFV
NFV
 
CPaaS.io Y1 Review Meeting - Cloud & Edge Programming
CPaaS.io Y1 Review Meeting - Cloud & Edge ProgrammingCPaaS.io Y1 Review Meeting - Cloud & Edge Programming
CPaaS.io Y1 Review Meeting - Cloud & Edge Programming
 
Orchestrate Your AI Workload with Cisco Hyperflex, Powered by NVIDIA GPUs
Orchestrate Your AI Workload with Cisco Hyperflex, Powered by NVIDIA GPUs Orchestrate Your AI Workload with Cisco Hyperflex, Powered by NVIDIA GPUs
Orchestrate Your AI Workload with Cisco Hyperflex, Powered by NVIDIA GPUs
 

Similar to Xilinx Inference solution for DL using OpenPOWER systems

Fast, Scalable Quantized Neural Network Inference on FPGAs with FINN and Logi...
Fast, Scalable Quantized Neural Network Inference on FPGAs with FINN and Logi...Fast, Scalable Quantized Neural Network Inference on FPGAs with FINN and Logi...
Fast, Scalable Quantized Neural Network Inference on FPGAs with FINN and Logi...KTN
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...Linaro
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsXilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsGanesan Narayanasamy
 
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Codemotion
 
Xilinx Data Center Strategy and CCIX
Xilinx Data Center Strategy and CCIXXilinx Data Center Strategy and CCIX
Xilinx Data Center Strategy and CCIXYoshihiro Horie
 
HPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoTHPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoTRenee Yao
 
Consumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a BoxConsumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a BoxRebekah Rodriguez
 
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"Dov Nimratz, Roman Chobik "Embedded artificial intelligence"
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"Lviv Startup Club
 
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...Numenta
 
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision SystemHai Tao at AI Frontiers: Deep Learning For Embedded Vision System
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision SystemAI Frontiers
 
Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...
Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...
Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...AI Frontiers
 
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...Edge AI and Vision Alliance
 
HiPEAC 2019 Workshop - Vision Processing
HiPEAC 2019 Workshop - Vision ProcessingHiPEAC 2019 Workshop - Vision Processing
HiPEAC 2019 Workshop - Vision ProcessingTulipp. Eu
 
Mellnox Interconnect presentation in OpenPOWER Brazil workshop
Mellnox Interconnect presentation in OpenPOWER Brazil workshopMellnox Interconnect presentation in OpenPOWER Brazil workshop
Mellnox Interconnect presentation in OpenPOWER Brazil workshopGanesan Narayanasamy
 
OCP U.S. Summit 2017 Presentation
OCP U.S. Summit 2017 PresentationOCP U.S. Summit 2017 Presentation
OCP U.S. Summit 2017 PresentationNetronome
 
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ..."Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...Edge AI and Vision Alliance
 
ML for embedded systems at the edge - NXP and Arm - FINAL.pdf
ML for embedded systems at the edge - NXP and Arm - FINAL.pdfML for embedded systems at the edge - NXP and Arm - FINAL.pdf
ML for embedded systems at the edge - NXP and Arm - FINAL.pdfsetagllib
 
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP..."Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...Edge AI and Vision Alliance
 

Similar to Xilinx Inference solution for DL using OpenPOWER systems (20)

Fast, Scalable Quantized Neural Network Inference on FPGAs with FINN and Logi...
Fast, Scalable Quantized Neural Network Inference on FPGAs with FINN and Logi...Fast, Scalable Quantized Neural Network Inference on FPGAs with FINN and Logi...
Fast, Scalable Quantized Neural Network Inference on FPGAs with FINN and Logi...
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsXilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systems
 
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
 
Xilinx Data Center Strategy and CCIX
Xilinx Data Center Strategy and CCIXXilinx Data Center Strategy and CCIX
Xilinx Data Center Strategy and CCIX
 
HPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoTHPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoT
 
Consumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a BoxConsumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a Box
 
Sundance's presentation at B:RAI 2020
Sundance's presentation at B:RAI 2020Sundance's presentation at B:RAI 2020
Sundance's presentation at B:RAI 2020
 
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"Dov Nimratz, Roman Chobik "Embedded artificial intelligence"
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"
 
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...
 
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision SystemHai Tao at AI Frontiers: Deep Learning For Embedded Vision System
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System
 
Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...
Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...
Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...
 
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
 
HiPEAC 2019 Workshop - Vision Processing
HiPEAC 2019 Workshop - Vision ProcessingHiPEAC 2019 Workshop - Vision Processing
HiPEAC 2019 Workshop - Vision Processing
 
Mellnox Interconnect presentation in OpenPOWER Brazil workshop
Mellnox Interconnect presentation in OpenPOWER Brazil workshopMellnox Interconnect presentation in OpenPOWER Brazil workshop
Mellnox Interconnect presentation in OpenPOWER Brazil workshop
 
OCP U.S. Summit 2017 Presentation
OCP U.S. Summit 2017 PresentationOCP U.S. Summit 2017 Presentation
OCP U.S. Summit 2017 Presentation
 
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ..."Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
 
Re-Vision stack presentation
Re-Vision stack presentationRe-Vision stack presentation
Re-Vision stack presentation
 
ML for embedded systems at the edge - NXP and Arm - FINAL.pdf
ML for embedded systems at the edge - NXP and Arm - FINAL.pdfML for embedded systems at the edge - NXP and Arm - FINAL.pdf
ML for embedded systems at the edge - NXP and Arm - FINAL.pdf
 
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP..."Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
 

More from Ganesan Narayanasamy

Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency programGanesan Narayanasamy
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and VerilogGanesan Narayanasamy
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISAGanesan Narayanasamy
 
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Ganesan Narayanasamy
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsGanesan Narayanasamy
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...Ganesan Narayanasamy
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsGanesan Narayanasamy
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsGanesan Narayanasamy
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems Ganesan Narayanasamy
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Ganesan Narayanasamy
 

More from Ganesan Narayanasamy (20)

Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency program
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and Verilog
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA
 
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture
 
OpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT RoorkeeOpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT Roorkee
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systems
 
IBM BOA for POWER
IBM BOA for POWER IBM BOA for POWER
IBM BOA for POWER
 
OpenPOWER System Marconi100
OpenPOWER System Marconi100OpenPOWER System Marconi100
OpenPOWER System Marconi100
 
OpenPOWER Latest Updates
OpenPOWER Latest UpdatesOpenPOWER Latest Updates
OpenPOWER Latest Updates
 
POWER10 innovations for HPC
POWER10 innovations for HPCPOWER10 innovations for HPC
POWER10 innovations for HPC
 
Deeplearningusingcloudpakfordata
DeeplearningusingcloudpakfordataDeeplearningusingcloudpakfordata
Deeplearningusingcloudpakfordata
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
 
AI in healthcare - Use Cases
AI in healthcare - Use Cases AI in healthcare - Use Cases
AI in healthcare - Use Cases
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems
 
Poster from NUS
Poster from NUSPoster from NUS
Poster from NUS
 
SAP HANA on POWER9 systems
SAP HANA on POWER9 systemsSAP HANA on POWER9 systems
SAP HANA on POWER9 systems
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
 
AI in the enterprise
AI in the enterprise AI in the enterprise
AI in the enterprise
 

Recently uploaded

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Recently uploaded (20)

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Xilinx Inference solution for DL using OpenPOWER systems

  • 1. © Copyright 2019 Xilinx Ashish Sirasao Fellow, Accelerated Computing ashish.sirasao@xilinx.com Xilinx Inference Solution for Deep Learning
  • 2. © Copyright 2019 Xilinx Deep Learning Models – A broad spectrum • Feature Extraction • Object Detection • Image Segmentation Convolutional Neural Network • Sequence and Temporal Data • Speech to Text • Language Translation Recurrent Neural Network • Classification • Universal Function Approximator • Autoencoder Multi-Layer Perceptron Object Detection SegmentationClassification “Dog” Page 2
  • 3. © Copyright 2019 Xilinx Xilinx – Focus on Inference Page 3
  • 4. © Copyright 2019 Xilinx Deep learning resurgence - Till 2015 LeNet-5: 1998 AlexNet: 2012 VGG-Net: 2014 ResNet: 2015GoogLeNet: 2014 >> 4
  • 5. © Copyright 2019 Xilinx Rapid Algorithmic Changes – 2015 - 2018 >> 5
  • 6. © Copyright 2019 Xilinx Deep Learning on Xilinx Adaptable Devices >> 6 • 2D Array of MACs • Flexible on-chip memory access • High Bandwidth, Multiple Access Ports Data Parallel • Near Memory Compute • Programmable routing for data & filter reuse Custom Memory Hierarchy • Flexible Data Types • FP32/16, INT16/8/4/2, Binary/Ternary • Sparsity friendly compute Compression & Sparsity • Scalable device family for different applications • Built in System functions – Networking, Video, ARM Broad Device Range
  • 7. © Copyright 2019 Xilinx ALVEO Data Center Workloads >> 7 *GoogleNet v1 https://www.xilinx.com/products/boards-and-kits/alveo.html
  • 8. © Copyright 2019 Xilinx Variable Precision Compute Density – TOPs on VU9P >> 8 Weight/Activation VU9P MAX FLOAT (SP)/ FLOAT (SP) 2.18538 MAX FP16/FP16 5.64 MAX 8b/8b 17.48304 XFP8 (1,3,4) 23.60306 MAX 4b/4b - DSP 25.71035 XFP7 (1,3,3) 34.96608 MAX 4b/8b 41.72917 MAX 4b/4b - LUT 81.65384 MAX 2b/8b 92.10951 MAX T/8b 92.10951 MAX B/8b 92.10951 MAX B/4b 160.7017 MAX 2b/2b 314.7075 MAX B/B 686.6345 1 10 100 1000 MAX FLOAT (SP)/ FLOAT (SP) MAX FP16/FP16 MAX 8b/8b XFP8 (1,3,4) MAX 4b/4b - DSP XFP7 (1,3,3) MAX 4b/8b MAX 4b/4b - LUT MAX 2b/8b MAX T/8b MAX B/8b MAX B/4b MAX 2b/2b MAX B/B MAX TOPs (Log Scale) VU9P MAX TOPs Estimates at 700 MHz FMAX
  • 9. © Copyright 2019 Xilinx >> 9
  • 10. © Copyright 2019 Xilinx Customized overlays with ISA architecture for optimized implementation Easy plug and play with Software Stack Overlay Architecture Custom Processors Exploiting Xilinx FPGA Flexibility MLP Engine Scalable sparse and dense implementation* xDNN – CNN Engine for Large 16 nm Xilinx Devices** Deephi DPU – Flexible CNN Engine with Embedded Focus CHaiDNN – HLS based open source offering*** Deephi ESE LSTM Speech to Text engine Random Forest Configurable RF classification *https://github.com/Xilinx/gemx ** https://github.com/Xilinx/ml-suite *** https://github.com/Xilinx/CHaiDNN
  • 11. © Copyright 2019 Xilinx Inference Optimization Techniques Hotchips 2018 Tutorial – Michaela Blott, Xilinx Inc >> 11
  • 12. © Copyright 2019 Xilinx Model Pruning and Integer Arithmetic - Mainstream • RNN Models – 5x to 20x • CNN Models – 30% to 10x Model Compression provides compute abd memory gains • 8 bit solution loses no significant accuracy • BNNs are improving rapidly • Near consensus that inference can be very low precision • Image / CNN: 2-bit (binary) • Speech / RNN: 3-bit (ternary) Increasing Accuracy of Reduced Precision CNNs & BNNs
  • 13. © Copyright 2019 Xilinx Whole Application Acceleration Example - Smart City/Surveillance Efficient AI Deployment Requires Full Application Optimization
  • 14. © Copyright 2019 Xilinx Xilinx AI Development Stack – Edge to Cloud >> 14 Edge/Embedded Cloud/DC Platforms Z7020 Board Z7020 SOM ZU2/3 SOM ZU2/3 Card ZU9 Card ZCU102 ZCU104 Ultra96 Xilinx U200, U250 FPGA IP Deephi DPU xDNN v2 and v3 Deephi Runtime Software Stack xfDNN Runtime Deephi Compiler xfDNN Compiler Pruning and Quantization(Caffe and Tensorflow EA) Models 20+ pruned / customized / basic models Deephi LSTM SDSoC SDAccel
  • 15. © Copyright 2019 Xilinx >> 15 Xilinx Tensor Processor: An Inference Engine, Network Compiler + Runtime for Xilinx FPGAs
  • 16. © Copyright 2019 Xilinx >> 16
  • 17. © Copyright 2019 Xilinx xDNN Performance Comparison – Batch of 1 >> 17 https://www.xilinx.com/support/documentation/white_papers/wp504-accel-dnns.pdf
  • 18. © Copyright 2019 Xilinx https://github.com/Xilinx/ml-suite Server Platforms Intel x86, AMD Epyc, Power9, ARM FaaS AWS F1, Nimbix, Ali Cloud, Huawei Xilinx SDx Boards ALVEO U200 ALVEO U250 ALVEO 280 >> 18
  • 19. © Copyright 2019 Xilinx Python interface to simplify xdnn usage Blocking Non-Blocking 8-FPGA
  • 20. © Copyright 2019 XilinxAMD / XILINX CONFIDENTIAL Demo successfully shown at Xilinx Developer Forum Most photographed demo GoogLeNet Performance 30,000 images/sec, Int8, Batch 1, XDNN v3 Final softmax and FC layers running on AMD CPU overlapped with FPGA using optimized OpenBLAS Single U250 performance with XDNN v3 for GoogLeNet Massive Scaleout EPYC BOXX + 8 ALVEO U250 0 5000 10000 15000 20000 25000 30000 35000 Single FPGA 8 FPGAs GoogLeNet Performance (img/sec) Int8, Batch 1 XDNN v2 XDNN v3 PL Kernel Peak TOP/s (Int8) Latency (ms) Images/sec 4 Kernels--Throughput 19.088 1.82 4127 4 kernels – Low Latency 19.088 1.18 3389
  • 21. © Copyright 2019 Xilinx Ready to use Algorithms – Evaluate Baseline Models Face Object Detection, Landmarks, Recognition and Anti- spoofing People Object Detection, Pose estimation, Re-identification Video Analytics Object Detection, Multi- object tracking Attribute – Person, Car, Text – Plate number Segmentation Scene parsing, lane detection Medical Imaging Cervical cancer classification, guide-wire detection, cell segmentation Satellite Imaging Object detection, Accelerated pre and post processing>> 21
  • 22. © Copyright 2019 Xilinx Model Compression – Enabling Next Level of Performance Classification Networks Baseline Pruning Result 1 Pruning Result 2 Top-5 Top-5 ΔTop5 ratio Top-5 ΔTop5 ratio Resnet50 [7.7G] 91.65% 91.23% -0.42% 40% 90.79% -0.86% 32% Inception_v2 [4.0G] 91.07% 90.37% -0.70% 60% 90.07% -1.00% 55% SqueezeNet [778M] 83.19% 82.46% -0.73% 89% 81.57% -1.62% 75% Detection Networks Baseline mAP Pruning Result 1 Pruning Result 2 mAP ΔmAP ratio mAP ΔmAP ratio DetectNet [17.5G] 44.46 45.7 +1.24 63% 45.12 +0.66 50% SSD+VGG [ 117G] 61.5 62.0 +0.5 16% 60.4 -1.1 10% [A] SSD+VGG [ 173G] 57.1 58.7 +1.6 40% 56.6 -0.5 12% [B] Yolov2 [ 198G] 80.4 81.9 +1.5 28% 79.2 -1.2 7% Segmentation Networks Baseline Pruning Result 1 Pruning Result 2 mIoU mIoU ΔmIoU ratio mIoU ΔmIoU ratio FPN [163G] 65.69% 65.21% -0.48% 80% 64.07% -1.62% 60%
  • 23. © Copyright 2019 Xilinx Xilinx VERSAL – Breakthrough AI Inference Performance >> 23 https://www.xilinx.com/products/silicon-devices/acap/versal.html
  • 24. © Copyright 2019 Xilinx Adaptable. Intelligent.