The LEGaTO project has received funding from the European Union's Horizon 2020 research and
innovation programme under the grant agreement No 780681
16.10.20
Use cases
Computer Systems Week
Micha vor dem Berge
christmann informationstechnik + medien
Computer Systems Week
WP5 Objectives
• Develop & optimize use cases
− Build demonstrators
• Integrate all developed components
− Hardware, Firmware, Middleware, Runtime Frontend/Backend, Tools
• Evaluate project’s objectives
− Efficiency, runtime, TCO, etc
Computer Systems Week
WP5 and the LEGaTO Toolchain
Computer Systems Week
Use cases
Smart home
(UNIBI)
Smart city
(BSC)
Infection research
(HZI)
Secure IoT Gateway
(CHR)
Machine learning
(MIS)
Computer Systems Week 5
Use Case Smart City
• Air quality forecast for urban areas
• Running CFD simulations with Alya
• Based on real sensor data from Barcelona City
Overview
Computer Systems Week 6
Use Case Smart City
• Initial Optimization plan
• Allow more complex simulations: reduce execution-time
• Improve energy-efficiency: make use of FPGAs
• Improve FPGA designer productivity: port of kernels using OmpSs@FPGA
• Compute Kernels
• Explicit Momentum Solver uses ~60% of total computation time
• Sparse Matrix Vector (SpMV) uses ~10% of total computation time
• Drawbacks
• Incompatibilities between Mercurium and Alya implementation in Fortran
• Low performance of the FPGA implementation running sparse algebra algorithms
• Final Solution
• Change the LEGaTO component to NVIDIA’s Xavier GPUs
• Port the linear solver to exploit NVIDIA’s Xavier architecture
• Algebraic operations and SpMV are compatible with stream processing
• Baseline: one node of MareNostrum4
Optimizations
Computer Systems Week 7
Use Case Smart City
Numerical results
MareNostrum4: 2x Intel Xeon Platinum 8160 24C @2.1 GHz
vs
NVIDIA Xavier: 512-core Volta GPU + 8-core ARM v8.2 64-bit CPU
Computer Systems Week
Use Case Smart Home
• Smart mirror as intuitive interaction interface for the smart home
• Displays personalized information on mirror surface
• Can be controlled by gestures or voice commands
• Neuronal networks for many modules
• YOLO for object, gesture recognition
• WiderFACE, FaceNET for face recognitons
• Mozillas DeepSpeech for Speech recognition
• Many more features are in development (e.g. user behavior prediction)
• Idea was developed based on the implementation in a research apartment
• In cooperation with industry and EU´s leading social service company
• Controls wardrobe and presents personized information for the user
8
Overview
16.10.20
Computer Systems Week
Use Case Smart Home
Distribution to embedded hardware - Approach with OmpSs@Cluster
16.10.20
DeepSpeech
MagicMirror²
Audio Stream
Identities
Objects &
Gestures
DisplaySpeakerMessages from/
to other devices
FaceRecognition
Camera Image
(< 1m depth range)
Camera Image
(full image range)
Object Detection
FaceNet
(face representation)
Classifier
YOLO
Gesture Detection
Smart Home
Shared Memory Object
DNN (GPU computed)
Tracker
(Kalman Filter)
WiderFace
(face detection)
Tracker
(Kalman Filter)
Person Tracking
Transcript
Peripheral Input
Camera Broadcast
Microphone Input
Combinatorial Logic
Fusion of Detections
Speech Recognition
Decision
Maker
Computer Systems Week
Use Case Smart Home
Distribution to embedded hardware - Approach for OmpSs
16.10.20
DeepSpeech
MagicMirror²
Audio Stream
Identities
Objects &
Gestures
DisplaySpeakerMessages from/
to other devices
FaceRecognition
Camera Image
(< 1m depth range)
Camera Image
(full image range)
Object Detection
FaceNet
(face representation)
Classifier
YOLO
Gesture Detection
Smart Home
Shared Memory Object
DNN (GPU computed)
Tracker
(Kalman Filter)
WiderFace
(face detection)
Tracker
(Kalman Filter)
Person Tracking
Transcript
Peripheral Input
Camera Broadcast
Microphone Input
Combinatorial Logic
Fusion of Detections
Speech Recognition
Decision
Maker
Computer Systems Week
Use Case Smart Home
• Evaluation metrics
• Performance (FPS), goal: 5 – 10 FPS
• Total power consumption (W), goal: 50 W
• Energy efficiency (FPS / W)
• First prototype (baseline)
• 2x GeForce 1080Ti, Intel i7 7700K @4.2 GHz
• 12 FPS for face, object, and gesture recognition @650 Watt
• Second prototype
• 2x GeForce 2070, i9-9900K CPU @ 3.60GHz
• 25 FPS for face, object and gesture recognition @430 Watt
• Embedded hardware prototype
• 2x Nvidia AGX Xavier modules coupled via PCIe
• 16 FPS for face, object and gesture recognition @55 Watt
11
Baseline, hardware steps and metrics
16.10.20
Computer Systems Week
Use Case Smart Home
12
Benchmarks
16.10.20
0 5 10 15 20 25 30
Goal (10FPS)
Dual Nvidia Xavier
Nvidia Xavier
Second Optimizations
Introduction of Tensor Cores
First Optimizations
Start Point
Frame rate (FPS)
0 100 200 300 400 500 600 700
Energy Consumption (Watt)
Computer Systems Week
Use Case Smart Home
13
Benchmarks
16.10.20
0 0.05 0.1 0.15 0.2 0.25 0.3
Goal (10FPS)
Dual Nvidia Xavier
Nvidia Xavier
Second Optimizations
Introduction of Tensor Cores
First Optimizations
Start Point
FPS / Watt
Computer Systems Week
Use Case Infection Research
14
Overview
• Using statistical methods to research effectiveness of drugs, vaccination strategies and
harmfulness of pathogens
• Basis: Pilot studies
− Small sample size but large number of features (biomarker candidates)
− Distinguishing real correlations from random correlations
• Strategy
− Select top features from thousands that can predict the cases
− Estimating the appropriate sample size to get significant results
Computer Systems Week
Use Case Infection Research
• Optimisation goals
− Decrease simulation runtime
− Enabling to handle larger data sets and more complex problems
− Decrease energy consumption
• Evaluation metrics
− Runtime of the complete calculation
− Energy consumption for the complete calculation
15
Optimization plan
Computer Systems Week
Use Case Infection Research
16
Results
Evaluation whether the data contain more biomarker
candidates than would be expected in random data
Use a simple and fast filter method
(evaluation of each individual biomarker candidate)
Development of machine learning models
to evaluate (combinations of)
biomarker candidates
Find the best small combination
from the selected candidate biomarkers
1.
2.
3.
4.
Accelerated with Maxeler DFE: Speedup 822x
Accelerated with XITAO: Speedup 3x
(further optimization ongoing)
• Development enabled by LEGaTO hardware, Speedup 40x
• Transformation accelerated with OmpSs@FPGA, Speedup 10x
• Integrated with SCONE to enable data security
Computer Systems Week
Use Case Machine Learning
Motivation
• Deep Learning (DL), while very powerful, has a high energy consumption and computation time
due to the large amount of compute needed for a single prediction
• Has great potential, e.g. Computer Vision and Natural Language Processing
Goals, Metrics
• 10x decrease in energy consumption
• 10x increase in inference speed
− Increase throughput, reduce latency
• Showcase on several Deep Learning models
relevant to autonomous driving
• Hardware support for: Nvidia TX2 & Xavier,
ARM CPU, Intel CPU, Xilinx FPGA
17
Pixel-wise semantic segmentation of a crossing in Zurich.
Source: Cityscape dataset
Overview
Computer Systems Week
Use Case Machine Learning
18
Deep Learning optimisation engine
Computer Systems Week
Use Case Machine Learning
19
Results
Relative energy
efficiency
improvements
from DL Optimiser
Model
Platform ResNet18 VGG16 YoloV3
Intel Core i9 9900K 5.4x 8.0x 6.1x
Raspberry PI 3 b+ 10.3x 11.4x 4.3x
ZedBoard (Xilinx Zynq + ARM) - - -
NVIDIA Jetson AGX Xavier 10.7x 6.6x 10.1x
NVIDIA Jetson AGX Xavier + DLA 2.3x - 2.7x
NVIDIA Jetson TX2 7.2x 5.9x 3.0x
Model
Platform ResNet18 VGG16 YoloV3
Intel Core i9 9900K 7.4x 9.0x 3.2x
Raspberry PI 3 b+ 6.6x 11.0x 1.5x
ZedBoard (Xilinx Zynq + ARM) - - -
NVIDIA Jetson AGX Xavier 3.7x 2.6x 6.1x
NVIDIA Jetson AGX Xavier + DLA 2.2x - 1.9x
NVIDIA Jetson TX2 4.6x 4.2x 2.8x
Relative speed-up
improvements
from DL Optimiser
Computer Systems Week
Use Case Secure IoT Gateway
Typical IoT Scenario
Network gateway
Customer 1
Network gateway
Customer 2
Data Center
Computer Systems Week
Use Case Secure IoT Gateway
Secured IoT Scenario
Gateway Cluster
Network gateway
IoT bridge (WiFi)
Customer 1
IoT bridge
Network gateway
IoT bridge (WiFi)
Customer 2
IoT bridge
Data Center
Computer Systems Week
Use Case Secure IoT Gateway
Secured IoT Scenario
Network cockpit
Admin / Customer
Gateway Cluster
Network gateway
IoT bridge (WiFi)
Customer 1
IoT bridge
Network gateway
IoT bridge (WiFi)
Customer 2
IoT bridge
Data Center
Computer Systems Week
Use Case Secure IoT Gateway
Computer Systems Week
Use Case Secure IoT Gateway
• Evaluation of LEGaTO flow for Secure IoT Gateway
−Bottleneck is encryption for VPN
−AES encryption on CPU hardware crypt units (AES-NI) is optimal
−Not participating in the general optimization process
• Supporting the Smart Home use case
• Evaluation metrics
−Usability
−Network performance
• Throughput (MB/s)
• Ping latency (ms)
−TCO
25
Optimization plan
Thanks!
Micha vor dem Berge
christmann informationstechnik + medien

LEGaTO: Use cases

  • 1.
    The LEGaTO projecthas received funding from the European Union's Horizon 2020 research and innovation programme under the grant agreement No 780681 16.10.20 Use cases Computer Systems Week Micha vor dem Berge christmann informationstechnik + medien
  • 2.
    Computer Systems Week WP5Objectives • Develop & optimize use cases − Build demonstrators • Integrate all developed components − Hardware, Firmware, Middleware, Runtime Frontend/Backend, Tools • Evaluate project’s objectives − Efficiency, runtime, TCO, etc
  • 3.
    Computer Systems Week WP5and the LEGaTO Toolchain
  • 4.
    Computer Systems Week Usecases Smart home (UNIBI) Smart city (BSC) Infection research (HZI) Secure IoT Gateway (CHR) Machine learning (MIS)
  • 5.
    Computer Systems Week5 Use Case Smart City • Air quality forecast for urban areas • Running CFD simulations with Alya • Based on real sensor data from Barcelona City Overview
  • 6.
    Computer Systems Week6 Use Case Smart City • Initial Optimization plan • Allow more complex simulations: reduce execution-time • Improve energy-efficiency: make use of FPGAs • Improve FPGA designer productivity: port of kernels using OmpSs@FPGA • Compute Kernels • Explicit Momentum Solver uses ~60% of total computation time • Sparse Matrix Vector (SpMV) uses ~10% of total computation time • Drawbacks • Incompatibilities between Mercurium and Alya implementation in Fortran • Low performance of the FPGA implementation running sparse algebra algorithms • Final Solution • Change the LEGaTO component to NVIDIA’s Xavier GPUs • Port the linear solver to exploit NVIDIA’s Xavier architecture • Algebraic operations and SpMV are compatible with stream processing • Baseline: one node of MareNostrum4 Optimizations
  • 7.
    Computer Systems Week7 Use Case Smart City Numerical results MareNostrum4: 2x Intel Xeon Platinum 8160 24C @2.1 GHz vs NVIDIA Xavier: 512-core Volta GPU + 8-core ARM v8.2 64-bit CPU
  • 8.
    Computer Systems Week UseCase Smart Home • Smart mirror as intuitive interaction interface for the smart home • Displays personalized information on mirror surface • Can be controlled by gestures or voice commands • Neuronal networks for many modules • YOLO for object, gesture recognition • WiderFACE, FaceNET for face recognitons • Mozillas DeepSpeech for Speech recognition • Many more features are in development (e.g. user behavior prediction) • Idea was developed based on the implementation in a research apartment • In cooperation with industry and EU´s leading social service company • Controls wardrobe and presents personized information for the user 8 Overview 16.10.20
  • 9.
    Computer Systems Week UseCase Smart Home Distribution to embedded hardware - Approach with OmpSs@Cluster 16.10.20 DeepSpeech MagicMirror² Audio Stream Identities Objects & Gestures DisplaySpeakerMessages from/ to other devices FaceRecognition Camera Image (< 1m depth range) Camera Image (full image range) Object Detection FaceNet (face representation) Classifier YOLO Gesture Detection Smart Home Shared Memory Object DNN (GPU computed) Tracker (Kalman Filter) WiderFace (face detection) Tracker (Kalman Filter) Person Tracking Transcript Peripheral Input Camera Broadcast Microphone Input Combinatorial Logic Fusion of Detections Speech Recognition Decision Maker
  • 10.
    Computer Systems Week UseCase Smart Home Distribution to embedded hardware - Approach for OmpSs 16.10.20 DeepSpeech MagicMirror² Audio Stream Identities Objects & Gestures DisplaySpeakerMessages from/ to other devices FaceRecognition Camera Image (< 1m depth range) Camera Image (full image range) Object Detection FaceNet (face representation) Classifier YOLO Gesture Detection Smart Home Shared Memory Object DNN (GPU computed) Tracker (Kalman Filter) WiderFace (face detection) Tracker (Kalman Filter) Person Tracking Transcript Peripheral Input Camera Broadcast Microphone Input Combinatorial Logic Fusion of Detections Speech Recognition Decision Maker
  • 11.
    Computer Systems Week UseCase Smart Home • Evaluation metrics • Performance (FPS), goal: 5 – 10 FPS • Total power consumption (W), goal: 50 W • Energy efficiency (FPS / W) • First prototype (baseline) • 2x GeForce 1080Ti, Intel i7 7700K @4.2 GHz • 12 FPS for face, object, and gesture recognition @650 Watt • Second prototype • 2x GeForce 2070, i9-9900K CPU @ 3.60GHz • 25 FPS for face, object and gesture recognition @430 Watt • Embedded hardware prototype • 2x Nvidia AGX Xavier modules coupled via PCIe • 16 FPS for face, object and gesture recognition @55 Watt 11 Baseline, hardware steps and metrics 16.10.20
  • 12.
    Computer Systems Week UseCase Smart Home 12 Benchmarks 16.10.20 0 5 10 15 20 25 30 Goal (10FPS) Dual Nvidia Xavier Nvidia Xavier Second Optimizations Introduction of Tensor Cores First Optimizations Start Point Frame rate (FPS) 0 100 200 300 400 500 600 700 Energy Consumption (Watt)
  • 13.
    Computer Systems Week UseCase Smart Home 13 Benchmarks 16.10.20 0 0.05 0.1 0.15 0.2 0.25 0.3 Goal (10FPS) Dual Nvidia Xavier Nvidia Xavier Second Optimizations Introduction of Tensor Cores First Optimizations Start Point FPS / Watt
  • 14.
    Computer Systems Week UseCase Infection Research 14 Overview • Using statistical methods to research effectiveness of drugs, vaccination strategies and harmfulness of pathogens • Basis: Pilot studies − Small sample size but large number of features (biomarker candidates) − Distinguishing real correlations from random correlations • Strategy − Select top features from thousands that can predict the cases − Estimating the appropriate sample size to get significant results
  • 15.
    Computer Systems Week UseCase Infection Research • Optimisation goals − Decrease simulation runtime − Enabling to handle larger data sets and more complex problems − Decrease energy consumption • Evaluation metrics − Runtime of the complete calculation − Energy consumption for the complete calculation 15 Optimization plan
  • 16.
    Computer Systems Week UseCase Infection Research 16 Results Evaluation whether the data contain more biomarker candidates than would be expected in random data Use a simple and fast filter method (evaluation of each individual biomarker candidate) Development of machine learning models to evaluate (combinations of) biomarker candidates Find the best small combination from the selected candidate biomarkers 1. 2. 3. 4. Accelerated with Maxeler DFE: Speedup 822x Accelerated with XITAO: Speedup 3x (further optimization ongoing) • Development enabled by LEGaTO hardware, Speedup 40x • Transformation accelerated with OmpSs@FPGA, Speedup 10x • Integrated with SCONE to enable data security
  • 17.
    Computer Systems Week UseCase Machine Learning Motivation • Deep Learning (DL), while very powerful, has a high energy consumption and computation time due to the large amount of compute needed for a single prediction • Has great potential, e.g. Computer Vision and Natural Language Processing Goals, Metrics • 10x decrease in energy consumption • 10x increase in inference speed − Increase throughput, reduce latency • Showcase on several Deep Learning models relevant to autonomous driving • Hardware support for: Nvidia TX2 & Xavier, ARM CPU, Intel CPU, Xilinx FPGA 17 Pixel-wise semantic segmentation of a crossing in Zurich. Source: Cityscape dataset Overview
  • 18.
    Computer Systems Week UseCase Machine Learning 18 Deep Learning optimisation engine
  • 19.
    Computer Systems Week UseCase Machine Learning 19 Results Relative energy efficiency improvements from DL Optimiser Model Platform ResNet18 VGG16 YoloV3 Intel Core i9 9900K 5.4x 8.0x 6.1x Raspberry PI 3 b+ 10.3x 11.4x 4.3x ZedBoard (Xilinx Zynq + ARM) - - - NVIDIA Jetson AGX Xavier 10.7x 6.6x 10.1x NVIDIA Jetson AGX Xavier + DLA 2.3x - 2.7x NVIDIA Jetson TX2 7.2x 5.9x 3.0x Model Platform ResNet18 VGG16 YoloV3 Intel Core i9 9900K 7.4x 9.0x 3.2x Raspberry PI 3 b+ 6.6x 11.0x 1.5x ZedBoard (Xilinx Zynq + ARM) - - - NVIDIA Jetson AGX Xavier 3.7x 2.6x 6.1x NVIDIA Jetson AGX Xavier + DLA 2.2x - 1.9x NVIDIA Jetson TX2 4.6x 4.2x 2.8x Relative speed-up improvements from DL Optimiser
  • 20.
    Computer Systems Week UseCase Secure IoT Gateway Typical IoT Scenario Network gateway Customer 1 Network gateway Customer 2 Data Center
  • 21.
    Computer Systems Week UseCase Secure IoT Gateway Secured IoT Scenario Gateway Cluster Network gateway IoT bridge (WiFi) Customer 1 IoT bridge Network gateway IoT bridge (WiFi) Customer 2 IoT bridge Data Center
  • 22.
    Computer Systems Week UseCase Secure IoT Gateway Secured IoT Scenario Network cockpit Admin / Customer Gateway Cluster Network gateway IoT bridge (WiFi) Customer 1 IoT bridge Network gateway IoT bridge (WiFi) Customer 2 IoT bridge Data Center
  • 23.
    Computer Systems Week UseCase Secure IoT Gateway
  • 24.
    Computer Systems Week UseCase Secure IoT Gateway • Evaluation of LEGaTO flow for Secure IoT Gateway −Bottleneck is encryption for VPN −AES encryption on CPU hardware crypt units (AES-NI) is optimal −Not participating in the general optimization process • Supporting the Smart Home use case • Evaluation metrics −Usability −Network performance • Throughput (MB/s) • Ping latency (ms) −TCO 25 Optimization plan
  • 25.
    Thanks! Micha vor demBerge christmann informationstechnik + medien