SlideShare a Scribd company logo
GTC 2015 – Session S5429
Creating Dense Mixed GPU and FPGA Systems
With Tegra K1s Using OpenCL & CUDA
Lance Brown, Director - HPC
ColoradoEngineering.com
Lance.brown@coloradoengineering.com
719-641-7287 Cell
27 March 2015 ColoradoEngineering.com - Public Release 1
We Can Solve Really Cool Problems Now
• Heterogeneous computing is more than CPU + GPU
• ARM processors changed the game
• NVIDIA - GPU + ARM - CUDA
• TI - DSP + ARM - OpenCL
• Altera - FPGA + ARM – OpenCL
• Scalable from handheld to Enterprise & HPC
27 March 2015 ColoradoEngineering.com - Public Release Slide 2
Why Listen to CEI?
• Been using FPGAs since 1985
• Been solving massively parallel problems for over 30 years
• We have/are designing multiple 24 & 32 layer boards featuring Altera
FPGAs & NVIDIA GPUs
• Early adopter of new technologies and experts at marrying existing
technologies in new ways
27 March 2015 ColoradoEngineering.com - Public Release Slide 3
Game Changer #1
Altera’s Hard Floating Point Unit IP & OpenCL
• FPGAs have traditionally supported soft floating point
• Altera introduced IEEE 754 Hard Floating Point with Arria 10
• Arria 10 FPGAs are rated from 140 GigaFLOPS (GFLOPS) to 1.5
TeraFLOPS (TFLOPS)
• Details at: https://www.altera.com/en_US/pdfs/literature/po/bg-
floating-point-fpga.pdf
• OpenCV & Suricata Implementations Using OpenCL
• Partial Reconfiguration for Streamlined OpenCL Development
• On Intel’s 14 nm FinFET Fab
27 March 2015 ColoradoEngineering.com - Public Release Slide 4
Game Changer #2
NVIDIA Makes Tegra K1 Available
• GPU + ARM @ low power
• Very important – camera interfaces galore
• Can do significant processing at each edge node now
• Jetson Kit – awesome eval kit & affordable
• More importantly – chipset available through Arrow!
• Details at: https://developer.nvidia.com/hardware-design-and-
development
27 March 2015 ColoradoEngineering.com - Public Release Slide 5
CEI’s Epiphany – Ultimate CV Platform
Altera Arria 10 & NVIDIA Tegra K1?
+
1500 GFLOPS 326 GFLOPS27 March 2015 ColoradoEngineering.com - Public Release Slide 6
First Union – Dual TK1s + Arria 10
HPC-A10-K1GPU
K61
Health
Monitoring
HPC-A10
HPC-A10-K1GPU
X8 PCIE Gen3
GigE
2/4 GB
Micron
HMC
QDR II+
144 Mb
1334 MT/s
QSFP+
1 – 40 GbE
4 - 10 GbE
QSFP+
1 – 40 GbE
4 - 10 GbE
USB
Blaster
DisplayPort - Source DisplayPort - Sink
USB
3.0
USB
3.0
SMA SMA
PCIE
Switch
VITA 57 FMC
HPC
(Optional)
QDR II+
144 Mb
1334 MT/s
Tegra K1 System-On-Module TK1-SOM
16/32/
64 GB
eMMC
2/4/8
Gbit
DDR3
USB GigE HDMI
Tegra K1 System-On-Module TK1-SOM
16/32/
64 GB
eMMC
2/4/8
Gbit
DDR3
USB GigE HDMI
SMA
X4 PCIE GEN2 EXTRA X4 PCIE GEN2
SMA
CLK-IN
TK1-SOM Tegra K1 System-On-Module
16/32/64
GB
eMMC
1/2/4 GB
DDR3L
USB
2.0
GigE HDMI
2Inches
2 Inches
External Power x4 PCI Gen2, Clocks, i2c
JTAG
UART
Available
Stand-alone
27 March 2015 ColoradoEngineering.com - Public Release Slide 7
HPC-A10-K1GPU
Design Details
• NVIDIA GPUDirect Support
• TK1’s are root nodes
• TK1’s can be field upgraded
• 8 - High Speed 10GbE Ports
• CUDA on TK1
• OpenCL on Arria 10
• 2 GB/s to each TK1
• HMC is 17X faster than DDR3
• 12 to 25 Camera/Sensor I/Os
27 March 2015 ColoradoEngineering.com - Public Release Slide 8
• 1 to 21 Cameras/Sensors
• Makes dumb cameras smart
• 10/40 GbE Sensors
• OpenCL on FPGA
• CUDA on Tegra
27 March 2015 ColoradoEngineering.com - Public Release Slide 9
Single Node
C
C
C
C
C
C
C
C
C
4–10GbE4–10GbE
Display Port
USB/GigEUSB/GigE
C
C
C
C
C
C
C
C
FMC
C
C
C
C
Tesla K80s + HPC-A10-K1GPU
C
C
C
C
C
4–10GbE4–10GbE
Display Port
USB/GigEUSB/GigE
C
C
C
C
C
C
C
C
FMC
C
C
C
C
Telsa K80
Telsa K80
Telsa K80
Telsa K80
GPUDirect
27 March 2015 ColoradoEngineering.com - Public Release Slide 10
27 March 2015 ColoradoEngineering.com - Public Release Slide 11
Sensor Gateway
Smart Host Bus Adapter (HBA)
40GbE
40GbEFMC
40GbE40GbE
40GbEFMC
40GbE
Sensor
Cloud
Radar, MRI, PET,
Camera, EW, etc
Telsa K80 Cluster
Telsa K80 Cluster
• Easy to do now
• https://youtu.be/o5WtYiY5Hao
• Proficient in a day or two
• CAPI support too
• 95% to 99% Efficient as VHDL
27 March 2015 ColoradoEngineering.com - Public Release Slide 12
Programming FPGAs with OpenCL
EDGE Node Processing
• Process on the EDGE using GRID
• Distributed deep learning node
• Low cost
• 4G enabled
• Fusion of Radar, EO, IO and Sound
• Download apps from Google Play
• Feedback to Tesla K80s via GRID
• SmartCity Ready
• Military Level Device Security Built-in
NVIDIA
Tegra K1/X1
Computer Vision
Video Compression
5 MP Camera 5 MP Camera
5MPCamera5MPCamera
24 GHz Radar
System
Motion Detection
Camera Queuing
COMMS
Alerts
Streaming Video
4G LTE
WiFi
BlueTooth
USB
Altera
Cyclone V
Appliance
Security
PatchAntennaPatchAntenna
Patch Antenna Patch Antenna
Directional MicDirectional Mic
DirectionalMicDirectionalMic
27 March 2015 ColoradoEngineering.com - Public Release Slide 13
Distributed Aperture System
Distributed Sensors
• Large vehicle/Military ADAS
• SA360 systems
• Retrofit casino camera systems
• Make any sensor system smart
• Tegra K1/X1’s Scalable
• Mixture of CUDA & OpenCL
x4 Gen2 PCIe
2 GB/S
x4 Gen2 PCIe
2 GB/S
x4 Gen2 PCIe
2 GB/S x4 Gen2 PCIe
2 GB/S
x4 Gen2 PCIe
2 GB/S x4 Gen2 PCIe
2 GB/S
x4 Gen2 PCIe
2 GB/S
x4 Gen2 PCIe
2 GB/S
x4 Gen2 PCIe
2 GB/S
64 GB
eMMC
64 GB
eMMC
64 GB
eMMC
64 GB
eMMC
64 GB
eMMC
64 GB
eMMC
64 GB
eMMC
64 GB
eMMC
64 GB
eMMC
8 GB
DDR4
8 GB
DDR4
8 GB
DDR4
8 GB
DDR4
8 GB
DDR4
8 GB
DDR4
8 GB
DDR4
8 GB
DDR4
8 GB
DDR4
USB3 or
GigE
USB3 or
GigE
USB3 or
GigE
USB3 or
GigE
USB3 or
GigE
USB3 or
GigE
USB3 or
GigE
USB3 or
GigE
USB3 or
GigE
HDMI
4/8 GB
HMC
QDR-II+
Or
QDR-IV
HDMI HDMI HDMI HDMI
HDMIHDMIHDMIHDMI
Altera
Arria 10 SoC
x2 ARM
OpenCL
NVIDIA
Tegra X1
x4 ARM
CUDA/Linux
OpenCV
H.264/H.265
NVIDIA
Tegra X1
x4 ARM
CUDA/Linux
OpenCV
H.264/H.265
NVIDIA
Tegra X1
x4 ARM
CUDA/Linux
OpenCV
H.264/H.265
NVIDIA
Tegra X1
x4 ARM
CUDA/Linux
OpenCV
H.264/H.265
NVIDIA
Tegra X1
x4 ARM
CUDA/Linux
OpenCV
H.264/H.265
NVIDIA
Tegra X1
x4 ARM
CUDA/Linux
OpenCV
H.264/H.265
NVIDIA
Tegra X1
x4 ARM
CUDA/Linux
OpenCV
H.264/H.265
NVIDIA
Tegra X1
x4 ARM
CUDA/Linux
OpenCV
H.264/H.265
NVIDIA
Tegra X1
x4 ARM
CUDA/Linux
OpenCV
H.264/H.265Removable SATA Storage
40/10 GbE Ports
Main Display GPU
27 March 2015 ColoradoEngineering.com - Public Release Slide 14
Challenges
Hardware, Interconnects & Software
• FPGA + GPU
• CUDA, OpenCL or CUDA + OpenCL
• Working with MDA & AFRL on solutions
• Bandwidth
• Tegra K1/X1 are x4 Gen2 PCIe – limits number and resolution of sensors attached to
the Tegra.
• More processing has to be done of Tegra, but that is okay since Tegra’s keep
increasing in power every year
• Gen3 PCIe would be awesome
• PCIe backplane – Using 40 GbE ports eliminates PCIe bottleneck
• Root Nodes
• Tegra wants to root complex. Non-transparent switches need to be used
• If Tegra could be an endpoint, a whole new world would open up
27 March 2015 ColoradoEngineering.com - Public Release Slide 15
Future Architectures
Even Cooler Designs Possible
• Altera
• Arria 10 SoC
• Eliminates need for x86 CPU to run OpenCL
• Truly stand-alone appliances
• 100 GbE interfaces
• Stratix 10 and Stratix 10 SoC
• >10 TFLOPs for 100W
• Details: https://www.altera.com/products/fpga/stratix-series/stratix-10/overview.html
• NVIDIA VOLTA
• Looking for NVLink intermingling with FPGAs
• Virtual FPGAs + Virtual GPUs
• Allow instant scaling and data protection
27 March 2015 ColoradoEngineering.com - Public Release Slide 16
Summary
• GPU + FPGA can solve amazing and fun problems
• Tegra K1/X1 provide incredible capability at low cost which reduces
the size of FPGA needed.
• OpenCL and Hard Floating Point IP make the Altera FPGAs a great
partner with NVIDIA GPUs
• CEI is making scalable solutions to allow application developers to
deploy from handheld to enterrpise/HPC
27 March 2015 ColoradoEngineering.com - Public Release Slide 17
Hardware & Software Capabilities
• Enterprise & Embedded SW
• Net Centric, SOA, web services, J2EE,SQL
• C/C++
• CUDA & OpenCL
• Embedded real time code, RTOS, hardware
drivers, Fault Detection / Fault Isolation, etc.
• Simulations, APIs, and GUIs
• Cognitive Software
• Device Drivers
• National Instruments Labview
• DO-178C
• FPGA designs (VHDL/Verilog/Simulink)
• RF Design
▪ System / Subsystem Designs
▪ 30+ complex board designs
▪ 32 layer PCBs with blind and buried vias
▪ High speed (100s MHz  x GHz)
▪ Analog (RF & I/Q Receivers)
▪ Digital (FPGAs, DSPs, general purpose)
▪ ADC and DAC
▪ Standard and custom IO (busses, fabrics,
SerDes, etc.)
▪ Ruggedization and thermal management
▪ CSWaP
▪ Serial I/O (e.g. PCIe, Serdes)
▪ DO-254
27 March 2015 ColoradoEngineering.com - Public Release 18
For More Information
on Standard Products and
Custom Engineering Services
Call Us – 719-388-8582 Office
Emails Us – lance.brown@coloradoengineering.com
Visit Us – Colorado Springs, CO (Sunny 300+ Days)
Browse Us – www.ColoradoEngineering.com
27 March 2015 ColoradoEngineering.com - Public Release 19

More Related Content

What's hot

ODSA Proof of Concept SmartNIC Speeds & Feeds
ODSA Proof of Concept SmartNIC Speeds & FeedsODSA Proof of Concept SmartNIC Speeds & Feeds
ODSA Proof of Concept SmartNIC Speeds & Feeds
ODSA Workgroup
 
RISC-V growth and successes in technology and industry - embedded world 2021
RISC-V growth and successes in technology and industry - embedded world 2021RISC-V growth and successes in technology and industry - embedded world 2021
RISC-V growth and successes in technology and industry - embedded world 2021
RISC-V International
 
Sci scada toolbox
Sci scada toolboxSci scada toolbox
Sci scada toolbox
Awesomejk
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
Linaro
 
Easily emulating full systems on amazon fpg as
Easily emulating full systems on amazon fpg asEasily emulating full systems on amazon fpg as
Easily emulating full systems on amazon fpg as
RISC-V International
 
Sundance at the 49th Intelligent Sensing Program
Sundance at the 49th Intelligent Sensing ProgramSundance at the 49th Intelligent Sensing Program
Sundance at the 49th Intelligent Sensing Program
Sundance Multiprocessor Technology Ltd.
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64
Linaro
 
ELC North America 2021 Introduction to pin muxing and gpio control under linux
ELC  North America 2021 Introduction to pin muxing and gpio control under linuxELC  North America 2021 Introduction to pin muxing and gpio control under linux
ELC North America 2021 Introduction to pin muxing and gpio control under linux
Neil Armstrong
 
ODSA Use Case - SmartNIC
ODSA Use Case - SmartNICODSA Use Case - SmartNIC
ODSA Use Case - SmartNIC
ODSA Workgroup
 
VF360 OpenVPX Board w. Altera Stratix and TI KeyStone DSP
VF360 OpenVPX Board w. Altera Stratix and TI KeyStone DSPVF360 OpenVPX Board w. Altera Stratix and TI KeyStone DSP
VF360 OpenVPX Board w. Altera Stratix and TI KeyStone DSP
Sundance Multiprocessor Technology Ltd.
 
OpenCAPI Technology Ecosystem
OpenCAPI Technology EcosystemOpenCAPI Technology Ecosystem
OpenCAPI Technology Ecosystem
Ganesan Narayanasamy
 
virtio
virtiovirtio
virtio
zhaobrian
 
Secure IoT Firmware for RISC-V
Secure IoT Firmware for RISC-VSecure IoT Firmware for RISC-V
Secure IoT Firmware for RISC-V
RISC-V International
 
OpenDataPlane Project
OpenDataPlane ProjectOpenDataPlane Project
OpenDataPlane Project
GlobalLogic Ukraine
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Linaro
 
Embedded Recipes 2019 - Making embedded graphics less special
Embedded Recipes 2019 - Making embedded graphics less specialEmbedded Recipes 2019 - Making embedded graphics less special
Embedded Recipes 2019 - Making embedded graphics less special
Anne Nicolas
 
Using Qt under LGPLv3
Using Qt under LGPLv3Using Qt under LGPLv3
Using Qt under LGPLv3
Burkhard Stubert
 
Elc Europe 2020 : u-boot- porting and maintaining a bootloader for a multimed...
Elc Europe 2020 : u-boot- porting and maintaining a bootloader for a multimed...Elc Europe 2020 : u-boot- porting and maintaining a bootloader for a multimed...
Elc Europe 2020 : u-boot- porting and maintaining a bootloader for a multimed...
Neil Armstrong
 
Qt World Summit 2017: Qt vs. Web - Total Cost of Ownership
Qt World Summit 2017: Qt vs. Web - Total Cost of OwnershipQt World Summit 2017: Qt vs. Web - Total Cost of Ownership
Qt World Summit 2017: Qt vs. Web - Total Cost of Ownership
Burkhard Stubert
 

What's hot (20)

ODSA Proof of Concept SmartNIC Speeds & Feeds
ODSA Proof of Concept SmartNIC Speeds & FeedsODSA Proof of Concept SmartNIC Speeds & Feeds
ODSA Proof of Concept SmartNIC Speeds & Feeds
 
RISC-V growth and successes in technology and industry - embedded world 2021
RISC-V growth and successes in technology and industry - embedded world 2021RISC-V growth and successes in technology and industry - embedded world 2021
RISC-V growth and successes in technology and industry - embedded world 2021
 
Sci scada toolbox
Sci scada toolboxSci scada toolbox
Sci scada toolbox
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
Easily emulating full systems on amazon fpg as
Easily emulating full systems on amazon fpg asEasily emulating full systems on amazon fpg as
Easily emulating full systems on amazon fpg as
 
Sundance at the 49th Intelligent Sensing Program
Sundance at the 49th Intelligent Sensing ProgramSundance at the 49th Intelligent Sensing Program
Sundance at the 49th Intelligent Sensing Program
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64
 
ELC North America 2021 Introduction to pin muxing and gpio control under linux
ELC  North America 2021 Introduction to pin muxing and gpio control under linuxELC  North America 2021 Introduction to pin muxing and gpio control under linux
ELC North America 2021 Introduction to pin muxing and gpio control under linux
 
ODSA Use Case - SmartNIC
ODSA Use Case - SmartNICODSA Use Case - SmartNIC
ODSA Use Case - SmartNIC
 
VF360 OpenVPX Board w. Altera Stratix and TI KeyStone DSP
VF360 OpenVPX Board w. Altera Stratix and TI KeyStone DSPVF360 OpenVPX Board w. Altera Stratix and TI KeyStone DSP
VF360 OpenVPX Board w. Altera Stratix and TI KeyStone DSP
 
OpenCAPI Technology Ecosystem
OpenCAPI Technology EcosystemOpenCAPI Technology Ecosystem
OpenCAPI Technology Ecosystem
 
virtio
virtiovirtio
virtio
 
Secure IoT Firmware for RISC-V
Secure IoT Firmware for RISC-VSecure IoT Firmware for RISC-V
Secure IoT Firmware for RISC-V
 
OpenDataPlane Project
OpenDataPlane ProjectOpenDataPlane Project
OpenDataPlane Project
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
Embedded Recipes 2019 - Making embedded graphics less special
Embedded Recipes 2019 - Making embedded graphics less specialEmbedded Recipes 2019 - Making embedded graphics less special
Embedded Recipes 2019 - Making embedded graphics less special
 
Using Qt under LGPLv3
Using Qt under LGPLv3Using Qt under LGPLv3
Using Qt under LGPLv3
 
Elc Europe 2020 : u-boot- porting and maintaining a bootloader for a multimed...
Elc Europe 2020 : u-boot- porting and maintaining a bootloader for a multimed...Elc Europe 2020 : u-boot- porting and maintaining a bootloader for a multimed...
Elc Europe 2020 : u-boot- porting and maintaining a bootloader for a multimed...
 
Qt World Summit 2017: Qt vs. Web - Total Cost of Ownership
Qt World Summit 2017: Qt vs. Web - Total Cost of OwnershipQt World Summit 2017: Qt vs. Web - Total Cost of Ownership
Qt World Summit 2017: Qt vs. Web - Total Cost of Ownership
 

Similar to S5429_LanceBrown

Case Study: Porting Qt for Embedded Linux on Embedded Processors
Case Study: Porting Qt for Embedded Linux on Embedded ProcessorsCase Study: Porting Qt for Embedded Linux on Embedded Processors
Case Study: Porting Qt for Embedded Linux on Embedded Processors
account inactive
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep Learning
Sergey Karayev
 
Cuda meetup presentation 5
Cuda meetup presentation 5Cuda meetup presentation 5
Cuda meetup presentation 5
Rihards Gailums
 
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mãoWebinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Embarcados
 
Asus Tinker Board
Asus Tinker BoardAsus Tinker Board
Asus Tinker Board
Niyazi SARAL
 
Efabless Marketplace webinar slides 2024
Efabless Marketplace webinar slides 2024Efabless Marketplace webinar slides 2024
Efabless Marketplace webinar slides 2024
Nobin Mathew
 
E3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - SundanceE3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - Sundance
Sundance Multiprocessor Technology Ltd.
 
HiPEAC-Keynote.pptx
HiPEAC-Keynote.pptxHiPEAC-Keynote.pptx
HiPEAC-Keynote.pptx
Behzad Salami
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018
NVIDIA
 
A Primer on FPGAs - Field Programmable Gate Arrays
A Primer on FPGAs - Field Programmable Gate ArraysA Primer on FPGAs - Field Programmable Gate Arrays
A Primer on FPGAs - Field Programmable Gate Arrays
Taylor Riggan
 
DaVinci DM8168 SuperHD Image Capture Solution
DaVinci DM8168 SuperHD Image Capture SolutionDaVinci DM8168 SuperHD Image Capture Solution
DaVinci DM8168 SuperHD Image Capture Solution
Flemming Christensen
 
SoC~FPGA~ASIC~Embedded
SoC~FPGA~ASIC~EmbeddedSoC~FPGA~ASIC~Embedded
SoC~FPGA~ASIC~Embedded
Chili.CHIPS
 
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
DPDK summit 2015: It's kind of fun  to do the impossible with DPDKDPDK summit 2015: It's kind of fun  to do the impossible with DPDK
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
Lagopus SDN/OpenFlow switch
 
DPDK Summit 2015 - NTT - Yoshihiro Nakajima
DPDK Summit 2015 - NTT - Yoshihiro NakajimaDPDK Summit 2015 - NTT - Yoshihiro Nakajima
DPDK Summit 2015 - NTT - Yoshihiro Nakajima
Jim St. Leger
 
NVIDIA GTC 2019: Red Hat and the NVIDIA DGX: Tried, Tested, Trusted
NVIDIA GTC 2019:  Red Hat and the NVIDIA DGX: Tried, Tested, TrustedNVIDIA GTC 2019:  Red Hat and the NVIDIA DGX: Tried, Tested, Trusted
NVIDIA GTC 2019: Red Hat and the NVIDIA DGX: Tried, Tested, Trusted
Jeremy Eder
 
GPU enablement for data science on OpenShift | DevNation Tech Talk
GPU enablement for data science on OpenShift | DevNation Tech TalkGPU enablement for data science on OpenShift | DevNation Tech Talk
GPU enablement for data science on OpenShift | DevNation Tech Talk
Red Hat Developers
 
Webinar Renesas - IoT é Segura? Com Renesas Synergy sim! E o SSP 1.5 tornou a...
Webinar Renesas - IoT é Segura? Com Renesas Synergy sim! E o SSP 1.5 tornou a...Webinar Renesas - IoT é Segura? Com Renesas Synergy sim! E o SSP 1.5 tornou a...
Webinar Renesas - IoT é Segura? Com Renesas Synergy sim! E o SSP 1.5 tornou a...
Embarcados
 
Escolhendo o Processador DaVinciTM para sua Aplicação de ...
Escolhendo o Processador DaVinciTM para sua Aplicação de ...Escolhendo o Processador DaVinciTM para sua Aplicação de ...
Escolhendo o Processador DaVinciTM para sua Aplicação de ...Videoguy
 
Escolhendo o Processador DaVinciTM para sua Aplicação de ...
Escolhendo o Processador DaVinciTM para sua Aplicação de ...Escolhendo o Processador DaVinciTM para sua Aplicação de ...
Escolhendo o Processador DaVinciTM para sua Aplicação de ...Videoguy
 
Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server...
Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server...Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server...
Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server...DVClub
 

Similar to S5429_LanceBrown (20)

Case Study: Porting Qt for Embedded Linux on Embedded Processors
Case Study: Porting Qt for Embedded Linux on Embedded ProcessorsCase Study: Porting Qt for Embedded Linux on Embedded Processors
Case Study: Porting Qt for Embedded Linux on Embedded Processors
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep Learning
 
Cuda meetup presentation 5
Cuda meetup presentation 5Cuda meetup presentation 5
Cuda meetup presentation 5
 
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mãoWebinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
 
Asus Tinker Board
Asus Tinker BoardAsus Tinker Board
Asus Tinker Board
 
Efabless Marketplace webinar slides 2024
Efabless Marketplace webinar slides 2024Efabless Marketplace webinar slides 2024
Efabless Marketplace webinar slides 2024
 
E3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - SundanceE3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - Sundance
 
HiPEAC-Keynote.pptx
HiPEAC-Keynote.pptxHiPEAC-Keynote.pptx
HiPEAC-Keynote.pptx
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018
 
A Primer on FPGAs - Field Programmable Gate Arrays
A Primer on FPGAs - Field Programmable Gate ArraysA Primer on FPGAs - Field Programmable Gate Arrays
A Primer on FPGAs - Field Programmable Gate Arrays
 
DaVinci DM8168 SuperHD Image Capture Solution
DaVinci DM8168 SuperHD Image Capture SolutionDaVinci DM8168 SuperHD Image Capture Solution
DaVinci DM8168 SuperHD Image Capture Solution
 
SoC~FPGA~ASIC~Embedded
SoC~FPGA~ASIC~EmbeddedSoC~FPGA~ASIC~Embedded
SoC~FPGA~ASIC~Embedded
 
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
DPDK summit 2015: It's kind of fun  to do the impossible with DPDKDPDK summit 2015: It's kind of fun  to do the impossible with DPDK
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
 
DPDK Summit 2015 - NTT - Yoshihiro Nakajima
DPDK Summit 2015 - NTT - Yoshihiro NakajimaDPDK Summit 2015 - NTT - Yoshihiro Nakajima
DPDK Summit 2015 - NTT - Yoshihiro Nakajima
 
NVIDIA GTC 2019: Red Hat and the NVIDIA DGX: Tried, Tested, Trusted
NVIDIA GTC 2019:  Red Hat and the NVIDIA DGX: Tried, Tested, TrustedNVIDIA GTC 2019:  Red Hat and the NVIDIA DGX: Tried, Tested, Trusted
NVIDIA GTC 2019: Red Hat and the NVIDIA DGX: Tried, Tested, Trusted
 
GPU enablement for data science on OpenShift | DevNation Tech Talk
GPU enablement for data science on OpenShift | DevNation Tech TalkGPU enablement for data science on OpenShift | DevNation Tech Talk
GPU enablement for data science on OpenShift | DevNation Tech Talk
 
Webinar Renesas - IoT é Segura? Com Renesas Synergy sim! E o SSP 1.5 tornou a...
Webinar Renesas - IoT é Segura? Com Renesas Synergy sim! E o SSP 1.5 tornou a...Webinar Renesas - IoT é Segura? Com Renesas Synergy sim! E o SSP 1.5 tornou a...
Webinar Renesas - IoT é Segura? Com Renesas Synergy sim! E o SSP 1.5 tornou a...
 
Escolhendo o Processador DaVinciTM para sua Aplicação de ...
Escolhendo o Processador DaVinciTM para sua Aplicação de ...Escolhendo o Processador DaVinciTM para sua Aplicação de ...
Escolhendo o Processador DaVinciTM para sua Aplicação de ...
 
Escolhendo o Processador DaVinciTM para sua Aplicação de ...
Escolhendo o Processador DaVinciTM para sua Aplicação de ...Escolhendo o Processador DaVinciTM para sua Aplicação de ...
Escolhendo o Processador DaVinciTM para sua Aplicação de ...
 
Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server...
Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server...Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server...
Leveraging Low-Cost FPGA Prototyping for Validation of Highly Threaded Server...
 

S5429_LanceBrown

  • 1. GTC 2015 – Session S5429 Creating Dense Mixed GPU and FPGA Systems With Tegra K1s Using OpenCL & CUDA Lance Brown, Director - HPC ColoradoEngineering.com Lance.brown@coloradoengineering.com 719-641-7287 Cell 27 March 2015 ColoradoEngineering.com - Public Release 1
  • 2. We Can Solve Really Cool Problems Now • Heterogeneous computing is more than CPU + GPU • ARM processors changed the game • NVIDIA - GPU + ARM - CUDA • TI - DSP + ARM - OpenCL • Altera - FPGA + ARM – OpenCL • Scalable from handheld to Enterprise & HPC 27 March 2015 ColoradoEngineering.com - Public Release Slide 2
  • 3. Why Listen to CEI? • Been using FPGAs since 1985 • Been solving massively parallel problems for over 30 years • We have/are designing multiple 24 & 32 layer boards featuring Altera FPGAs & NVIDIA GPUs • Early adopter of new technologies and experts at marrying existing technologies in new ways 27 March 2015 ColoradoEngineering.com - Public Release Slide 3
  • 4. Game Changer #1 Altera’s Hard Floating Point Unit IP & OpenCL • FPGAs have traditionally supported soft floating point • Altera introduced IEEE 754 Hard Floating Point with Arria 10 • Arria 10 FPGAs are rated from 140 GigaFLOPS (GFLOPS) to 1.5 TeraFLOPS (TFLOPS) • Details at: https://www.altera.com/en_US/pdfs/literature/po/bg- floating-point-fpga.pdf • OpenCV & Suricata Implementations Using OpenCL • Partial Reconfiguration for Streamlined OpenCL Development • On Intel’s 14 nm FinFET Fab 27 March 2015 ColoradoEngineering.com - Public Release Slide 4
  • 5. Game Changer #2 NVIDIA Makes Tegra K1 Available • GPU + ARM @ low power • Very important – camera interfaces galore • Can do significant processing at each edge node now • Jetson Kit – awesome eval kit & affordable • More importantly – chipset available through Arrow! • Details at: https://developer.nvidia.com/hardware-design-and- development 27 March 2015 ColoradoEngineering.com - Public Release Slide 5
  • 6. CEI’s Epiphany – Ultimate CV Platform Altera Arria 10 & NVIDIA Tegra K1? + 1500 GFLOPS 326 GFLOPS27 March 2015 ColoradoEngineering.com - Public Release Slide 6
  • 7. First Union – Dual TK1s + Arria 10 HPC-A10-K1GPU K61 Health Monitoring HPC-A10 HPC-A10-K1GPU X8 PCIE Gen3 GigE 2/4 GB Micron HMC QDR II+ 144 Mb 1334 MT/s QSFP+ 1 – 40 GbE 4 - 10 GbE QSFP+ 1 – 40 GbE 4 - 10 GbE USB Blaster DisplayPort - Source DisplayPort - Sink USB 3.0 USB 3.0 SMA SMA PCIE Switch VITA 57 FMC HPC (Optional) QDR II+ 144 Mb 1334 MT/s Tegra K1 System-On-Module TK1-SOM 16/32/ 64 GB eMMC 2/4/8 Gbit DDR3 USB GigE HDMI Tegra K1 System-On-Module TK1-SOM 16/32/ 64 GB eMMC 2/4/8 Gbit DDR3 USB GigE HDMI SMA X4 PCIE GEN2 EXTRA X4 PCIE GEN2 SMA CLK-IN TK1-SOM Tegra K1 System-On-Module 16/32/64 GB eMMC 1/2/4 GB DDR3L USB 2.0 GigE HDMI 2Inches 2 Inches External Power x4 PCI Gen2, Clocks, i2c JTAG UART Available Stand-alone 27 March 2015 ColoradoEngineering.com - Public Release Slide 7
  • 8. HPC-A10-K1GPU Design Details • NVIDIA GPUDirect Support • TK1’s are root nodes • TK1’s can be field upgraded • 8 - High Speed 10GbE Ports • CUDA on TK1 • OpenCL on Arria 10 • 2 GB/s to each TK1 • HMC is 17X faster than DDR3 • 12 to 25 Camera/Sensor I/Os 27 March 2015 ColoradoEngineering.com - Public Release Slide 8
  • 9. • 1 to 21 Cameras/Sensors • Makes dumb cameras smart • 10/40 GbE Sensors • OpenCL on FPGA • CUDA on Tegra 27 March 2015 ColoradoEngineering.com - Public Release Slide 9 Single Node C C C C C C C C C 4–10GbE4–10GbE Display Port USB/GigEUSB/GigE C C C C C C C C FMC C C C C
  • 10. Tesla K80s + HPC-A10-K1GPU C C C C C 4–10GbE4–10GbE Display Port USB/GigEUSB/GigE C C C C C C C C FMC C C C C Telsa K80 Telsa K80 Telsa K80 Telsa K80 GPUDirect 27 March 2015 ColoradoEngineering.com - Public Release Slide 10
  • 11. 27 March 2015 ColoradoEngineering.com - Public Release Slide 11 Sensor Gateway Smart Host Bus Adapter (HBA) 40GbE 40GbEFMC 40GbE40GbE 40GbEFMC 40GbE Sensor Cloud Radar, MRI, PET, Camera, EW, etc Telsa K80 Cluster Telsa K80 Cluster
  • 12. • Easy to do now • https://youtu.be/o5WtYiY5Hao • Proficient in a day or two • CAPI support too • 95% to 99% Efficient as VHDL 27 March 2015 ColoradoEngineering.com - Public Release Slide 12 Programming FPGAs with OpenCL
  • 13. EDGE Node Processing • Process on the EDGE using GRID • Distributed deep learning node • Low cost • 4G enabled • Fusion of Radar, EO, IO and Sound • Download apps from Google Play • Feedback to Tesla K80s via GRID • SmartCity Ready • Military Level Device Security Built-in NVIDIA Tegra K1/X1 Computer Vision Video Compression 5 MP Camera 5 MP Camera 5MPCamera5MPCamera 24 GHz Radar System Motion Detection Camera Queuing COMMS Alerts Streaming Video 4G LTE WiFi BlueTooth USB Altera Cyclone V Appliance Security PatchAntennaPatchAntenna Patch Antenna Patch Antenna Directional MicDirectional Mic DirectionalMicDirectionalMic 27 March 2015 ColoradoEngineering.com - Public Release Slide 13
  • 14. Distributed Aperture System Distributed Sensors • Large vehicle/Military ADAS • SA360 systems • Retrofit casino camera systems • Make any sensor system smart • Tegra K1/X1’s Scalable • Mixture of CUDA & OpenCL x4 Gen2 PCIe 2 GB/S x4 Gen2 PCIe 2 GB/S x4 Gen2 PCIe 2 GB/S x4 Gen2 PCIe 2 GB/S x4 Gen2 PCIe 2 GB/S x4 Gen2 PCIe 2 GB/S x4 Gen2 PCIe 2 GB/S x4 Gen2 PCIe 2 GB/S x4 Gen2 PCIe 2 GB/S 64 GB eMMC 64 GB eMMC 64 GB eMMC 64 GB eMMC 64 GB eMMC 64 GB eMMC 64 GB eMMC 64 GB eMMC 64 GB eMMC 8 GB DDR4 8 GB DDR4 8 GB DDR4 8 GB DDR4 8 GB DDR4 8 GB DDR4 8 GB DDR4 8 GB DDR4 8 GB DDR4 USB3 or GigE USB3 or GigE USB3 or GigE USB3 or GigE USB3 or GigE USB3 or GigE USB3 or GigE USB3 or GigE USB3 or GigE HDMI 4/8 GB HMC QDR-II+ Or QDR-IV HDMI HDMI HDMI HDMI HDMIHDMIHDMIHDMI Altera Arria 10 SoC x2 ARM OpenCL NVIDIA Tegra X1 x4 ARM CUDA/Linux OpenCV H.264/H.265 NVIDIA Tegra X1 x4 ARM CUDA/Linux OpenCV H.264/H.265 NVIDIA Tegra X1 x4 ARM CUDA/Linux OpenCV H.264/H.265 NVIDIA Tegra X1 x4 ARM CUDA/Linux OpenCV H.264/H.265 NVIDIA Tegra X1 x4 ARM CUDA/Linux OpenCV H.264/H.265 NVIDIA Tegra X1 x4 ARM CUDA/Linux OpenCV H.264/H.265 NVIDIA Tegra X1 x4 ARM CUDA/Linux OpenCV H.264/H.265 NVIDIA Tegra X1 x4 ARM CUDA/Linux OpenCV H.264/H.265 NVIDIA Tegra X1 x4 ARM CUDA/Linux OpenCV H.264/H.265Removable SATA Storage 40/10 GbE Ports Main Display GPU 27 March 2015 ColoradoEngineering.com - Public Release Slide 14
  • 15. Challenges Hardware, Interconnects & Software • FPGA + GPU • CUDA, OpenCL or CUDA + OpenCL • Working with MDA & AFRL on solutions • Bandwidth • Tegra K1/X1 are x4 Gen2 PCIe – limits number and resolution of sensors attached to the Tegra. • More processing has to be done of Tegra, but that is okay since Tegra’s keep increasing in power every year • Gen3 PCIe would be awesome • PCIe backplane – Using 40 GbE ports eliminates PCIe bottleneck • Root Nodes • Tegra wants to root complex. Non-transparent switches need to be used • If Tegra could be an endpoint, a whole new world would open up 27 March 2015 ColoradoEngineering.com - Public Release Slide 15
  • 16. Future Architectures Even Cooler Designs Possible • Altera • Arria 10 SoC • Eliminates need for x86 CPU to run OpenCL • Truly stand-alone appliances • 100 GbE interfaces • Stratix 10 and Stratix 10 SoC • >10 TFLOPs for 100W • Details: https://www.altera.com/products/fpga/stratix-series/stratix-10/overview.html • NVIDIA VOLTA • Looking for NVLink intermingling with FPGAs • Virtual FPGAs + Virtual GPUs • Allow instant scaling and data protection 27 March 2015 ColoradoEngineering.com - Public Release Slide 16
  • 17. Summary • GPU + FPGA can solve amazing and fun problems • Tegra K1/X1 provide incredible capability at low cost which reduces the size of FPGA needed. • OpenCL and Hard Floating Point IP make the Altera FPGAs a great partner with NVIDIA GPUs • CEI is making scalable solutions to allow application developers to deploy from handheld to enterrpise/HPC 27 March 2015 ColoradoEngineering.com - Public Release Slide 17
  • 18. Hardware & Software Capabilities • Enterprise & Embedded SW • Net Centric, SOA, web services, J2EE,SQL • C/C++ • CUDA & OpenCL • Embedded real time code, RTOS, hardware drivers, Fault Detection / Fault Isolation, etc. • Simulations, APIs, and GUIs • Cognitive Software • Device Drivers • National Instruments Labview • DO-178C • FPGA designs (VHDL/Verilog/Simulink) • RF Design ▪ System / Subsystem Designs ▪ 30+ complex board designs ▪ 32 layer PCBs with blind and buried vias ▪ High speed (100s MHz  x GHz) ▪ Analog (RF & I/Q Receivers) ▪ Digital (FPGAs, DSPs, general purpose) ▪ ADC and DAC ▪ Standard and custom IO (busses, fabrics, SerDes, etc.) ▪ Ruggedization and thermal management ▪ CSWaP ▪ Serial I/O (e.g. PCIe, Serdes) ▪ DO-254 27 March 2015 ColoradoEngineering.com - Public Release 18
  • 19. For More Information on Standard Products and Custom Engineering Services Call Us – 719-388-8582 Office Emails Us – lance.brown@coloradoengineering.com Visit Us – Colorado Springs, CO (Sunny 300+ Days) Browse Us – www.ColoradoEngineering.com 27 March 2015 ColoradoEngineering.com - Public Release 19