SlideShare a Scribd company logo
EM3V - Embedded Vision
Sundance Multiprocessor Technology, Ltd.
Pedro Machado pedro.m@sundance.com
Fatima Kishwar fatima.k@sundance.com
Flemming Christensen flemming.c@sundance.com
7/16/2018
1
OVERVIEW
• Company profile
• Introduction
• Technologies
• VCS-1 (EMC2) System
• Discussion
• Future Work
7/16/2018
2
SERVICE AND EXCELLENCE!
Established in 1989 by Flemming CHRISTENSEN
• Employee Owned and a ’Life-Style’ company
• 10x people with 300+ years experince
• 4x with accredited Xilinx FPGA training
• Always designed and built our own products
• BSI - ISO9001-2015 certified, since 2003
Techology Focus
• Acceleration,Vision, Sensor & Robotics
7/16/2018
3
MODULAR, RECONFIGURABLE - LIKE LEGO®
• Reduced time-to-market
• Rapid Prototyping
• Reconfigurable
• Flexible
• Modular
• Scalable
• Reliable
Year 1989
7/16/2018
4
5
TURN-KEY SYSTEM SOLUTIONS FROM A-Z
• Custom Bespoke Systems
• Hazardous Environment
• Custom Integration Service
• Design of Enclosures
• Commercial-of-the-Shelves Systems (COTS)
• Flexible and Upgradeable
• Design for Excellence
• Maintenance with ease
INTRODUCTION
 Increasing demand for High Performance Computing
 Everyone wants more compute-power
 Finer time-steps; larger data-sets; better models
 Decreasing single-threaded performance
 Emphasis on multi-core CPUs and parallelism
 Do computational biologists need to learn PThreads?
 Increasing focus on power and space
 Boxes are cheap: 16 node clusters are very affordable
 Where do you put them? Who is paying for power?
 How can we use hardware acceleration to help?
7/16/2018
6
TYPES OF HARDWARE ACCELERATOR
 GPU : Graphics Processing Unit
 Many-core - 30 SIMD processors per device
 High bandwidth, low complexity memory – no caches
 MPPA : Massively Parallel Processor Array
 Grid of simple processors – 300 tiny RISC CPUs
 Point-to-point connections on 2-D grid
 FPGA : Field Programmable Gate Array
 Fine-grained grid of logic and small RAMs
 Build whatever you want
7/16/2018
7
HARDWARE ADVANTAGES:
PERFORMANCE
A Comparison of CPUs, GPUs, FPGAs, and MPPAs for Random Number Generation,
D. Thomas, L. Howes, and W. Luk , In Proc. of FPGA, pgs. 22-24, 2009
1
10
1
31
0
10
20
30
40
C P U G P U MP P A F P G A
S peed-up
• More parallelism - more performance
• GPU: 30 cores, 16-way SIMD
• MPPA: 300 tiny RISC cores
• FPGA: hundreds of parallel functional units
7/16/2018
8
HARDWARE ADVANTAGES: POWER
A Comparison of CPUs, GPUs, FPGAs, and MPPAs for Random Number Generation,
D. Thomas, L. Howes, and W. Luk , In Proc. of FPGA, pgs. 22-24, 2009
1 10 1
31
1 9 18
175
0
50
100
150
200
C P U G P U MP P A F P G A
S peed-up E fficiency
• GPU: 1.2GHz - same power as CPU
• MPPA: 300MHz - Same performance as CPU, but 18x less power
• FPGA: 300MHz - faster and less power
7/16/2018
9
FPGA ACCELERATED APPLICATIONS
 Finance
 2006: Option pricing: 30x CPU
 2007: Multivariate Value-at-Risk: 33x Quad CPU
 2008: Credit-risk analysis: 60x Quad CPU
 Bioinformatics
 2007: Protein Graph Labelling: 100x Quad CPU
 Neural Networks
 2008: Spiking Neural Networks: 4x Quad CPU
1.1x GPU
All with less than a fifth of the power
7/16/2018
10
PROBLEM: DESIGN EFFORT
 Researchers love scripting languages: Matlab, Python, Perl
 Simple to use and understand, lots of libraries
 Easy to experiment and develop promising prototype
 Eventually prototype is ready: need to scale to large problems
 Need to rewrite prototype to improve performance: e.g. Matlab to C
 Simplicity of prototype is hidden by layers of optimisation
Hour Day Week Month
0.25
1
Year
4
16
64
256
Scripted
Compiled
Multi-threaded
Vectorised
Relative
Performance
Design-time
CPU
7/16/2018
11
PROBLEMS: DESIGN EFFORT
 GPUs provide a somewhat gentle learning curve
 CUDA and OpenCL almost allow compilation of ordinary C code
 User must understand GPU architecture to maximise speed-up
 Code must be radically altered to maximise use of functional units
 Memory structures and accesses must map onto physical RAM banks
 We are asking the user to learn about things they don’t care about
Hour Day Week Month
0.25
1
Year
4
16
64
256
Compiled
C to GPU
Reorganisation
Memory Opt.
Relative
Performance
Design-time
CPU
GPU
7/16/2018
12
PROBLEMS: DESIGN EFFORT
 FPGAs provide large speed-up and power savings – at a price!
 Days or weeks to get an initial version working
 Multiple optimisation and verification cycles to get high performance
 Too risky and too specialised for most users
 Months of retraining for an uncertain speed-up
 Currently only used in large projects, with dedicated FPGA engineer
Hour Day Week Month
0.25
1
Year
4
16
64
256
Initial Design
Parallelisation
Clock Rate
Relative
Performance
Design-time
CPU
GPU
FPGA
7/16/2018
13
GOAL: FPGAS FOR THE MASSES
 Accelerate niche applications with limited user-base
 Don’t have to wait for traditional “heroic” optimisation
 Single-source description
 The prototype code is the final code
 Encourage experimentation
 Give users freedom to tweak and modify
 Target platforms at multiple scales
 Individual user; Research group; Enterprise
 Use domain specific knowledge about applications
 Identify bottlenecks: optimise them
 Identify design patterns: automate them
 Don’t try to do general purpose “C to hardware”
7/16/2018
14
WHICH TECHNOLOGY?
 Clear architectural trend of parallelism and heterogeneity
 Heterogeneous devices have many tradeoffs
 Usage cases also affect best device choice
 Problem: huge design space
Problem
Solution
CPU CPU CPU CPU CPU
Problem
Solution
Execution time: 10 sec Execution time: 2.5 sec
CPU FPGAPCI
CPU GPUPCI
Orders of Magnitude
Improvement
CPU
CPU CPU
GPU
FPGA
Clock Rate
Base
Complexity
Base
Parallelism
Base
Power
Base
ExecutionTime
Task Size
Sequential CPU
Multicore CPU
GPU
FPGA
Huge Design Space
•Which Accelerator?
•Which Brand?
•Number of cores?
•Which Device?
•Device Cost?
•Design time?
•Which algorithm?
•Use case
optimization?
7/16/2018
15
TYPICAL CV APPLICATIONS: SLIDING
WINDOW
 Contribution: thorough analysis of devices and use cases for
sliding window applications
 Sliding window used in many domains, including image
processing and embedded
CPU
CPU
CPU
CPU
CPU
GPU FPGA
Devices Algorithms Use Cases
KernelSize
Image SizeCorrentropyConvolution
Sum of Absolute Differences
7/16/2018
16
TYPICAL CV APPLICATIONS: SLIDING
WINDOW APPLICATIONS
 Consider a 2D Sliding Window with 16-bit grayscale image inputs
 Applies window function against a window from image and the kernel
 “Slides” the window to get the next input
 Repeats for every possible window
 45x45 kernel on 1080p 30-FPS video = 120 billion memory accesses/second
Window KernelWindow 0
Window Function
Output Pixel
Window W-1
Window 1Input: image of size x×y, kernel of size n×m
for (row=0; row < x-n; row++) {
for (col=0; col < y-m; col++) {
// get n*m pixels (i.e., windows
// starting from current row and col)
window=image[row:row+n-1][col:col+m-1]
output[row][col]=f(window,kernel)
}
}
7/16/2018
17
APP 1: SUM OF ABSOLUTE DIFFERENCES
(SAD)
 Used for: H.264 encoding, object identification
 Window function: point-wise absolute difference, followed by
summation
Kernel
W1
WindowX
W2
W3 W4
k1 k2
k3 k4
W1 W2 W3 W4
k1 k2 k3 k4
d1 d2 d3 d4 a1 a2 a3 a4
Ox
Output Pixel
Absolute Value()
Ox = 0;
For pixel in window:
Ox += abs(pixeli – kerneli)
7/16/2018
18
APP 2: 2D CONVOLUTION
 Used for: filtering, edge detection
 Window function: point-wise product followed by summation
Kernel
W1
WindowX
W2
W3 W4
k1 k2
k3 k4
W1 W2 W3 W4
k4 k3 k2 k1
p1 p2 p3 p4
Ox Output Pixel
Ox = 0;
For pixel in window:
Ox += (pixeli x kerneln-i)
7/16/2018
19
APP 3: CORRENTROPY
 Used for: optical flow, obstacle avoidance
 Window function: Gaussian of point-wise absolute difference,
followed by summation
W1 W2 W3 W4
k1 k2 k3 k4
a1 a2 a3 a4
Ox
Output Pixel
Absolute Difference
Ox = 0;
For pixel in window:
Ox += Gauss(abs(pixeli – kerneli))
g1 g2 g3 g4
Gaussian Function()
7/16/2018
20
HETEROGENEOUS COMPUTING SYSTEM
IN TOP500 LIST
 Reason: Significant performance/energy-efficiency boost from
GPU/CPU
1
8 7
17
39
62
53
75
102
94
0
20
40
60
80
100
120
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
(June)
Heterogeneous computing system in Top500
7/16/2018
21
 Specialized accelerator for data-parallel
applications
 Optimized for processing massive data
 Give up unrelated goal and features
 Give up optimizing latency for processing
single data
 Give up branch prediction, out-of-order
execution
 Give up large traditional cache hierarchy
 More resource for parallel are processing
 More cores, more ALU
GPU: SPECIALIZED ACCELERATOR FOR A
SET OF APPLICATIONS
Data DataCore
Data DataCore
Data DataCore
Data DataCore
7/16/2018
22
 Only provides primitive building blocks for computation
 Register, addition/multiplication , memories, programmable Boolean operations and
connections
 Build application-specific accelerator from primitives building blocks
 Interconnection between primitive functional units
 Timing of data movement between primitive functional units
 Opportunities for optimizations for a specific application!
 Maximizing efficiency while throwing away redundancy
CREATING APPLICATION-SPECIFIC
ACCELERATOR WITH FPGA
Virtex®-7 FPGA
Precise, Low Jitter Clocking
MMCMs
Logic Fabric
LUT-6 CLB
DSP Engines
DSP48E1 Slices
On-Chip Memory
36Kbit/18Kbit Block
RAM
Enhanced Connectivity
PCIe® Interface Blocks
Hi-perf. Parallel I/O
Connectivity
SelectIO™ Technology
Hi-performance Serial I//O Connectivity
Transceiver Technology
7/16/2018
23
 Require tremendous efforts
 Extensive knowledge of digital circuit design
 The potential of FPGAs is not easily accessible by
common software engineers
THE CHALLENGES OF PROMOTING
FPGAS AMONG SOFTWARE ENGINEERS
AXI Master
Burst inference
DSP48
Timing Closure
Stable interface
Loop rewind
247/16/2018
24
• Zynq-7000 devices are equipped with
dual-core ARM Cortex-A9 processors
integrated with 28nm Artix-7 or Kintex®-
7 based programmable logic for
excellent performance-per-watt and
maximum design flexibility.
• Sundance’s EMC2 carrier board is
Compatible with all the Zynq-7000
Series.
HYBRID ARQUITECTURE
7/16/2018
25
• Zynq® UltraScale+™ MPSoC devices provide 64-bit processor scalability
while combining real-time control with soft and hard engines for graphics,
video, waveform, and packet processing.
• Sundance’s EMC2 carrier board is compatible with all the Zynq®
UltraScale+™ MPSoC. Our focus is now on the XCZU4EV device (automotive
grade).
WHAT IS THE BEST SOLUTION?
7/16/2018
26
VCS-1 (EMC2) HARDWARE FEATURES
Connectivity:
 FM191-R; FMC-LPC to:
 15x Digital I/Os [DB9]
 12x Analogue Inputs [DB9]
 8x Analogue Outputs [DB9]
 1x Expansion [SEIC]
 FM191-U; SEIC to:
 4x USB3.0 [USB-c]
 28x GPIO [40-pin GPIO]
 FM191-A1; 40-pin GPIO
 28x GPIO [DB9]
7/16/2018
27
The ZU4EV MPSoC is compatible with a wide range of sensors.
Depth sensor (up to 20m). Interface
USB3.0
Thermal
imaging
Interface: Eth
Movidius Neural
Compute Stick
Interface: USB3.0
VCS-1 (EMC2) SENSORS COMPATIBILITY
7/16/2018
28JAI AD-130GE
Interface: Eth
VCS-1 (EMC2) COMPATIBILITY
VCS-1 features:
 Raspberry PI and Arduino compatible;
 Compatible with most of the Arduino/RPI
sensors and actuators;
 4x USB3.0 ports for interfacing with a wide
range of sensors;
 MQTT and OpenCV compatible
 ROS compatible
 ROS2 ready
 HIPPEROS ready
7/16/2018
29
The VCS-1 will be fully compatible
with the Xilinx reVision stack.
 Includes support for the most
popular neural networks
including AlexNet, GoogLeNet,
VGG, SSD, and FCN.
 Optimized implementations for
CNN network layers, required to
build custom neural networks
(DNN/CNN)
7/16/2018
30
Xilinx reVISION stack
DEEP LEARNING ON THE VCS-1
(EMC2)
VCS-1 (EMC2) OPEN SOURCE
SOFTWARE AND FIRMWARE
Open Source Hardware/software and online
documentation:
 Open Hardware Repository
https://www.ohwr.org/projects/emc2-dp
 Microsoft Windows/Linux 64-bit SDK
https://github.com/SundanceMultiprocessorTechnology/V
CS-1_SDK
 FM191 ARM SDK
https://github.com/SundanceMultiprocessorTechnology/V
CS-1_FM191_SDK
 FM191 Firmware
https://github.com/SundanceMultiprocessorTechnology/V
CS-1_FM191_FW
 EMC2 ROS
 https://github.com/SundanceMultiprocessorTechnology
/VCS-1_emc2_ros
7/16/2018
31
A custom enclosure was specially
designed for accommodating the VCS-1
system.
7/16/2018
32
VCS-1(EMC2) ENCLOSURE
DISCUSSION
The VCS-1 has the following
characteristics:
1. High performance
(24V@0.595A)
2. Low power consumption
3. Highly compatible with a wide
range of commercially
available sensors and actuators
4. Highly optimised for computer
vision applications
5. Fully reconfigurable
7/16/2018
33
WHAT NEXT?
 Be part of our customers family by ordering our products or
hiring our services.
 Sundance University Program (SUP)
 Access to hardware prototypes
 Advisory Board Members
 BSC/MSc/PhD Internships
 Funding capture
 H2020
 InnovateUK
 EPSRC
Know more about SUP and on-going projects
https://www.sundance.com/sundance-in-eu-projects-
programs/
7/16/2018
34
UNIVERSITY CLIENTS
7/16/2018
35
INDUSTRIAL CLIENTS
7/16/2018
36
QUESTIONS?
7/16/2018
37
Sundance Multiprocessor Technology, Ltd.
Pedro Machado pedro.m@sundance.com
Fatima Kishwar fatima.k@sundance.com
Flemming Christensen flemming.c@sundance.com
EM3V - Embedded Vision

More Related Content

What's hot

MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platform
Ganesan Narayanasamy
 
02 ai inference acceleration with components all in open hardware: opencapi a...
02 ai inference acceleration with components all in open hardware: opencapi a...02 ai inference acceleration with components all in open hardware: opencapi a...
02 ai inference acceleration with components all in open hardware: opencapi a...
Yutaka Kawai
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
inside-BigData.com
 
PG-4119, 3D Geometry Compression on GPU, by Jacques Lefaucheux
PG-4119, 3D Geometry Compression on GPU, by Jacques LefaucheuxPG-4119, 3D Geometry Compression on GPU, by Jacques Lefaucheux
PG-4119, 3D Geometry Compression on GPU, by Jacques Lefaucheux
AMD Developer Central
 
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
AMD Developer Central
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platform
Ganesan Narayanasamy
 
Improving engineering efficiency through tiled hierarchical flows
Improving engineering efficiency through tiled hierarchical flowsImproving engineering efficiency through tiled hierarchical flows
Improving engineering efficiency through tiled hierarchical flows
Andreas Olofsson
 
ANSYS SCADE Usage for Unmanned Aircraft Vehicles
ANSYS SCADE Usage for Unmanned Aircraft VehiclesANSYS SCADE Usage for Unmanned Aircraft Vehicles
ANSYS SCADE Usage for Unmanned Aircraft Vehicles
Ansys
 
WT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian Ballantyne
WT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian BallantyneWT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian Ballantyne
WT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian Ballantyne
AMD Developer Central
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu Das
AMD Developer Central
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
inside-BigData.com
 
Programming Models for Exascale Systems
Programming Models for Exascale SystemsProgramming Models for Exascale Systems
Programming Models for Exascale Systems
inside-BigData.com
 
OpenPOWER System Marconi100
OpenPOWER System Marconi100OpenPOWER System Marconi100
OpenPOWER System Marconi100
Ganesan Narayanasamy
 
SGI: Meeting Manufacturing's Need for Production Supercomputing
SGI: Meeting Manufacturing's Need for Production SupercomputingSGI: Meeting Manufacturing's Need for Production Supercomputing
SGI: Meeting Manufacturing's Need for Production Supercomputing
inside-BigData.com
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsXilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Ganesan Narayanasamy
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
inside-BigData.com
 
Part 2 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 2   Maximizing the utilization of GPU resources on-premise and in the cloudPart 2   Maximizing the utilization of GPU resources on-premise and in the cloud
Part 2 Maximizing the utilization of GPU resources on-premise and in the cloud
Univa, an Altair Company
 
High Performance Interconnects: Assessment & Rankings
High Performance Interconnects: Assessment & RankingsHigh Performance Interconnects: Assessment & Rankings
High Performance Interconnects: Assessment & Rankings
inside-BigData.com
 
OpenCAPI next generation accelerator
OpenCAPI next generation accelerator OpenCAPI next generation accelerator
OpenCAPI next generation accelerator
Ganesan Narayanasamy
 
It's Time to ROCm!
It's Time to ROCm!It's Time to ROCm!
It's Time to ROCm!
inside-BigData.com
 

What's hot (20)

MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platform
 
02 ai inference acceleration with components all in open hardware: opencapi a...
02 ai inference acceleration with components all in open hardware: opencapi a...02 ai inference acceleration with components all in open hardware: opencapi a...
02 ai inference acceleration with components all in open hardware: opencapi a...
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
 
PG-4119, 3D Geometry Compression on GPU, by Jacques Lefaucheux
PG-4119, 3D Geometry Compression on GPU, by Jacques LefaucheuxPG-4119, 3D Geometry Compression on GPU, by Jacques Lefaucheux
PG-4119, 3D Geometry Compression on GPU, by Jacques Lefaucheux
 
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platform
 
Improving engineering efficiency through tiled hierarchical flows
Improving engineering efficiency through tiled hierarchical flowsImproving engineering efficiency through tiled hierarchical flows
Improving engineering efficiency through tiled hierarchical flows
 
ANSYS SCADE Usage for Unmanned Aircraft Vehicles
ANSYS SCADE Usage for Unmanned Aircraft VehiclesANSYS SCADE Usage for Unmanned Aircraft Vehicles
ANSYS SCADE Usage for Unmanned Aircraft Vehicles
 
WT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian Ballantyne
WT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian BallantyneWT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian Ballantyne
WT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian Ballantyne
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu Das
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
 
Programming Models for Exascale Systems
Programming Models for Exascale SystemsProgramming Models for Exascale Systems
Programming Models for Exascale Systems
 
OpenPOWER System Marconi100
OpenPOWER System Marconi100OpenPOWER System Marconi100
OpenPOWER System Marconi100
 
SGI: Meeting Manufacturing's Need for Production Supercomputing
SGI: Meeting Manufacturing's Need for Production SupercomputingSGI: Meeting Manufacturing's Need for Production Supercomputing
SGI: Meeting Manufacturing's Need for Production Supercomputing
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsXilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systems
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
Part 2 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 2   Maximizing the utilization of GPU resources on-premise and in the cloudPart 2   Maximizing the utilization of GPU resources on-premise and in the cloud
Part 2 Maximizing the utilization of GPU resources on-premise and in the cloud
 
High Performance Interconnects: Assessment & Rankings
High Performance Interconnects: Assessment & RankingsHigh Performance Interconnects: Assessment & Rankings
High Performance Interconnects: Assessment & Rankings
 
OpenCAPI next generation accelerator
OpenCAPI next generation accelerator OpenCAPI next generation accelerator
OpenCAPI next generation accelerator
 
It's Time to ROCm!
It's Time to ROCm!It's Time to ROCm!
It's Time to ROCm!
 

Similar to E3MV - Embedded Vision - Sundance

Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
iguazio
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
NVIDIA Taiwan
 
OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020
OpenACC
 
22by7 and DellEMC Tech Day July 20 2017 - Power Edge
22by7 and DellEMC Tech Day July 20 2017 - Power Edge22by7 and DellEMC Tech Day July 20 2017 - Power Edge
22by7 and DellEMC Tech Day July 20 2017 - Power Edge
Sashikris
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
Alan Sill
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep Learning
Sergey Karayev
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
inside-BigData.com
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Lablup Inc.
 
Cuda meetup presentation 5
Cuda meetup presentation 5Cuda meetup presentation 5
Cuda meetup presentation 5
Rihards Gailums
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Rebekah Rodriguez
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Rebekah Rodriguez
 
POWER9 for AI & HPC
POWER9 for AI & HPCPOWER9 for AI & HPC
POWER9 for AI & HPC
inside-BigData.com
 
TechEvent Exdata X7-2 POC with OVM
TechEvent Exdata X7-2 POC with OVMTechEvent Exdata X7-2 POC with OVM
TechEvent Exdata X7-2 POC with OVM
Trivadis
 
08 Supercomputer Fugaku
08 Supercomputer Fugaku08 Supercomputer Fugaku
08 Supercomputer Fugaku
RCCSRENKEI
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrjRoberto Brandao
 
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in Vacouver
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in VacouverNGD Systems and Microsoft Keynote Presentation at IPDPS MPP in Vacouver
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in Vacouver
Scott Shadley, MBA,PMC-III
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018
NVIDIA
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand Challenge
Anand Haridass
 
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
NVIDIA Taiwan
 
Trends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsTrends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systems
Igor José F. Freitas
 

Similar to E3MV - Embedded Vision - Sundance (20)

Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
 
OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020
 
22by7 and DellEMC Tech Day July 20 2017 - Power Edge
22by7 and DellEMC Tech Day July 20 2017 - Power Edge22by7 and DellEMC Tech Day July 20 2017 - Power Edge
22by7 and DellEMC Tech Day July 20 2017 - Power Edge
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep Learning
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
Cuda meetup presentation 5
Cuda meetup presentation 5Cuda meetup presentation 5
Cuda meetup presentation 5
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
 
POWER9 for AI & HPC
POWER9 for AI & HPCPOWER9 for AI & HPC
POWER9 for AI & HPC
 
TechEvent Exdata X7-2 POC with OVM
TechEvent Exdata X7-2 POC with OVMTechEvent Exdata X7-2 POC with OVM
TechEvent Exdata X7-2 POC with OVM
 
08 Supercomputer Fugaku
08 Supercomputer Fugaku08 Supercomputer Fugaku
08 Supercomputer Fugaku
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrj
 
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in Vacouver
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in VacouverNGD Systems and Microsoft Keynote Presentation at IPDPS MPP in Vacouver
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in Vacouver
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand Challenge
 
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
 
Trends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsTrends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systems
 

More from Sundance Multiprocessor Technology Ltd.

Sundance Perception Blade
Sundance Perception BladeSundance Perception Blade
Sundance Perception Blade
Sundance Multiprocessor Technology Ltd.
 
Sundance's presentation at B:RAI 2020
Sundance's presentation at B:RAI 2020Sundance's presentation at B:RAI 2020
Sundance's presentation at B:RAI 2020
Sundance Multiprocessor Technology Ltd.
 
Sundance at the 49th Intelligent Sensing Program
Sundance at the 49th Intelligent Sensing ProgramSundance at the 49th Intelligent Sensing Program
Sundance at the 49th Intelligent Sensing Program
Sundance Multiprocessor Technology Ltd.
 
Sundance VCS-1 for Precision Robotics
Sundance VCS-1 for Precision RoboticsSundance VCS-1 for Precision Robotics
Sundance VCS-1 for Precision Robotics
Sundance Multiprocessor Technology Ltd.
 
TULIPP at the 10th Intelligent Imaging Event
TULIPP at the 10th Intelligent Imaging EventTULIPP at the 10th Intelligent Imaging Event
TULIPP at the 10th Intelligent Imaging Event
Sundance Multiprocessor Technology Ltd.
 
Sundance TULIPP Workshop at Nottingham Trent University
Sundance TULIPP Workshop at Nottingham Trent UniversitySundance TULIPP Workshop at Nottingham Trent University
Sundance TULIPP Workshop at Nottingham Trent University
Sundance Multiprocessor Technology Ltd.
 
TULIPP Starter Kit – AGRI
TULIPP Starter Kit – AGRITULIPP Starter Kit – AGRI
TULIPP Starter Kit – AGRI
Sundance Multiprocessor Technology Ltd.
 
System Design on Zynq using SDSoC
System Design on Zynq using SDSoCSystem Design on Zynq using SDSoC
System Design on Zynq using SDSoC
Sundance Multiprocessor Technology Ltd.
 
Re-Vision stack presentation
Re-Vision stack presentationRe-Vision stack presentation
Re-Vision stack presentation
Sundance Multiprocessor Technology Ltd.
 
Moving object detection on FPGA
Moving object detection on FPGAMoving object detection on FPGA
Moving object detection on FPGA
Sundance Multiprocessor Technology Ltd.
 
ANPR FPGA Workshop
ANPR FPGA WorkshopANPR FPGA Workshop
HiPEAC 2018 - CPS, why all the fuss?
HiPEAC 2018 - CPS, why all the fuss?HiPEAC 2018 - CPS, why all the fuss?
HiPEAC 2018 - CPS, why all the fuss?
Sundance Multiprocessor Technology Ltd.
 
Sundance HiPEAC 2018 Presentation
Sundance HiPEAC 2018 PresentationSundance HiPEAC 2018 Presentation
Sundance HiPEAC 2018 Presentation
Sundance Multiprocessor Technology Ltd.
 
TULIPP - Leaving a legacy: The ultimate Low-Power Image Processing Handbook
TULIPP - Leaving a legacy: The ultimate Low-Power Image Processing HandbookTULIPP - Leaving a legacy: The ultimate Low-Power Image Processing Handbook
TULIPP - Leaving a legacy: The ultimate Low-Power Image Processing Handbook
Sundance Multiprocessor Technology Ltd.
 
TULIPP at NMI 18-5-17
TULIPP at NMI 18-5-17TULIPP at NMI 18-5-17
Open VPX Tutorial
Open VPX TutorialOpen VPX Tutorial
VF360 OpenVPX Board w. Altera Stratix and TI KeyStone DSP
VF360 OpenVPX Board w. Altera Stratix and TI KeyStone DSPVF360 OpenVPX Board w. Altera Stratix and TI KeyStone DSP
VF360 OpenVPX Board w. Altera Stratix and TI KeyStone DSP
Sundance Multiprocessor Technology Ltd.
 
Stack PC in PC104 Land
Stack PC in PC104 LandStack PC in PC104 Land
EMC2 Xilinx SDSoC presentation
EMC2 Xilinx SDSoC presentationEMC2 Xilinx SDSoC presentation
EMC2 Xilinx SDSoC presentation
Sundance Multiprocessor Technology Ltd.
 

More from Sundance Multiprocessor Technology Ltd. (20)

Sundance Perception Blade
Sundance Perception BladeSundance Perception Blade
Sundance Perception Blade
 
Sundance's presentation at B:RAI 2020
Sundance's presentation at B:RAI 2020Sundance's presentation at B:RAI 2020
Sundance's presentation at B:RAI 2020
 
Sundance at the 49th Intelligent Sensing Program
Sundance at the 49th Intelligent Sensing ProgramSundance at the 49th Intelligent Sensing Program
Sundance at the 49th Intelligent Sensing Program
 
Sundance VCS-1 for Precision Robotics
Sundance VCS-1 for Precision RoboticsSundance VCS-1 for Precision Robotics
Sundance VCS-1 for Precision Robotics
 
TULIPP at the 10th Intelligent Imaging Event
TULIPP at the 10th Intelligent Imaging EventTULIPP at the 10th Intelligent Imaging Event
TULIPP at the 10th Intelligent Imaging Event
 
Sundance TULIPP Workshop at Nottingham Trent University
Sundance TULIPP Workshop at Nottingham Trent UniversitySundance TULIPP Workshop at Nottingham Trent University
Sundance TULIPP Workshop at Nottingham Trent University
 
TULIPP Starter Kit – AGRI
TULIPP Starter Kit – AGRITULIPP Starter Kit – AGRI
TULIPP Starter Kit – AGRI
 
System Design on Zynq using SDSoC
System Design on Zynq using SDSoCSystem Design on Zynq using SDSoC
System Design on Zynq using SDSoC
 
Re-Vision stack presentation
Re-Vision stack presentationRe-Vision stack presentation
Re-Vision stack presentation
 
Moving object detection on FPGA
Moving object detection on FPGAMoving object detection on FPGA
Moving object detection on FPGA
 
ANPR FPGA Workshop
ANPR FPGA WorkshopANPR FPGA Workshop
ANPR FPGA Workshop
 
HiPEAC 2018 - CPS, why all the fuss?
HiPEAC 2018 - CPS, why all the fuss?HiPEAC 2018 - CPS, why all the fuss?
HiPEAC 2018 - CPS, why all the fuss?
 
Sundance HiPEAC 2018 Presentation
Sundance HiPEAC 2018 PresentationSundance HiPEAC 2018 Presentation
Sundance HiPEAC 2018 Presentation
 
TULIPP - Leaving a legacy: The ultimate Low-Power Image Processing Handbook
TULIPP - Leaving a legacy: The ultimate Low-Power Image Processing HandbookTULIPP - Leaving a legacy: The ultimate Low-Power Image Processing Handbook
TULIPP - Leaving a legacy: The ultimate Low-Power Image Processing Handbook
 
TULIPP at NMI 18-5-17
TULIPP at NMI 18-5-17TULIPP at NMI 18-5-17
TULIPP at NMI 18-5-17
 
Open VPX Tutorial
Open VPX TutorialOpen VPX Tutorial
Open VPX Tutorial
 
VF360 OpenVPX Board w. Altera Stratix and TI KeyStone DSP
VF360 OpenVPX Board w. Altera Stratix and TI KeyStone DSPVF360 OpenVPX Board w. Altera Stratix and TI KeyStone DSP
VF360 OpenVPX Board w. Altera Stratix and TI KeyStone DSP
 
Stack PC in PC104 Land
Stack PC in PC104 LandStack PC in PC104 Land
Stack PC in PC104 Land
 
EMC2 Xilinx SDSoC presentation
EMC2 Xilinx SDSoC presentationEMC2 Xilinx SDSoC presentation
EMC2 Xilinx SDSoC presentation
 
Pc 104 series 1 application showcase
Pc 104 series 1 application showcasePc 104 series 1 application showcase
Pc 104 series 1 application showcase
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 

E3MV - Embedded Vision - Sundance

  • 1. EM3V - Embedded Vision Sundance Multiprocessor Technology, Ltd. Pedro Machado pedro.m@sundance.com Fatima Kishwar fatima.k@sundance.com Flemming Christensen flemming.c@sundance.com 7/16/2018 1
  • 2. OVERVIEW • Company profile • Introduction • Technologies • VCS-1 (EMC2) System • Discussion • Future Work 7/16/2018 2
  • 3. SERVICE AND EXCELLENCE! Established in 1989 by Flemming CHRISTENSEN • Employee Owned and a ’Life-Style’ company • 10x people with 300+ years experince • 4x with accredited Xilinx FPGA training • Always designed and built our own products • BSI - ISO9001-2015 certified, since 2003 Techology Focus • Acceleration,Vision, Sensor & Robotics 7/16/2018 3
  • 4. MODULAR, RECONFIGURABLE - LIKE LEGO® • Reduced time-to-market • Rapid Prototyping • Reconfigurable • Flexible • Modular • Scalable • Reliable Year 1989 7/16/2018 4
  • 5. 5 TURN-KEY SYSTEM SOLUTIONS FROM A-Z • Custom Bespoke Systems • Hazardous Environment • Custom Integration Service • Design of Enclosures • Commercial-of-the-Shelves Systems (COTS) • Flexible and Upgradeable • Design for Excellence • Maintenance with ease
  • 6. INTRODUCTION  Increasing demand for High Performance Computing  Everyone wants more compute-power  Finer time-steps; larger data-sets; better models  Decreasing single-threaded performance  Emphasis on multi-core CPUs and parallelism  Do computational biologists need to learn PThreads?  Increasing focus on power and space  Boxes are cheap: 16 node clusters are very affordable  Where do you put them? Who is paying for power?  How can we use hardware acceleration to help? 7/16/2018 6
  • 7. TYPES OF HARDWARE ACCELERATOR  GPU : Graphics Processing Unit  Many-core - 30 SIMD processors per device  High bandwidth, low complexity memory – no caches  MPPA : Massively Parallel Processor Array  Grid of simple processors – 300 tiny RISC CPUs  Point-to-point connections on 2-D grid  FPGA : Field Programmable Gate Array  Fine-grained grid of logic and small RAMs  Build whatever you want 7/16/2018 7
  • 8. HARDWARE ADVANTAGES: PERFORMANCE A Comparison of CPUs, GPUs, FPGAs, and MPPAs for Random Number Generation, D. Thomas, L. Howes, and W. Luk , In Proc. of FPGA, pgs. 22-24, 2009 1 10 1 31 0 10 20 30 40 C P U G P U MP P A F P G A S peed-up • More parallelism - more performance • GPU: 30 cores, 16-way SIMD • MPPA: 300 tiny RISC cores • FPGA: hundreds of parallel functional units 7/16/2018 8
  • 9. HARDWARE ADVANTAGES: POWER A Comparison of CPUs, GPUs, FPGAs, and MPPAs for Random Number Generation, D. Thomas, L. Howes, and W. Luk , In Proc. of FPGA, pgs. 22-24, 2009 1 10 1 31 1 9 18 175 0 50 100 150 200 C P U G P U MP P A F P G A S peed-up E fficiency • GPU: 1.2GHz - same power as CPU • MPPA: 300MHz - Same performance as CPU, but 18x less power • FPGA: 300MHz - faster and less power 7/16/2018 9
  • 10. FPGA ACCELERATED APPLICATIONS  Finance  2006: Option pricing: 30x CPU  2007: Multivariate Value-at-Risk: 33x Quad CPU  2008: Credit-risk analysis: 60x Quad CPU  Bioinformatics  2007: Protein Graph Labelling: 100x Quad CPU  Neural Networks  2008: Spiking Neural Networks: 4x Quad CPU 1.1x GPU All with less than a fifth of the power 7/16/2018 10
  • 11. PROBLEM: DESIGN EFFORT  Researchers love scripting languages: Matlab, Python, Perl  Simple to use and understand, lots of libraries  Easy to experiment and develop promising prototype  Eventually prototype is ready: need to scale to large problems  Need to rewrite prototype to improve performance: e.g. Matlab to C  Simplicity of prototype is hidden by layers of optimisation Hour Day Week Month 0.25 1 Year 4 16 64 256 Scripted Compiled Multi-threaded Vectorised Relative Performance Design-time CPU 7/16/2018 11
  • 12. PROBLEMS: DESIGN EFFORT  GPUs provide a somewhat gentle learning curve  CUDA and OpenCL almost allow compilation of ordinary C code  User must understand GPU architecture to maximise speed-up  Code must be radically altered to maximise use of functional units  Memory structures and accesses must map onto physical RAM banks  We are asking the user to learn about things they don’t care about Hour Day Week Month 0.25 1 Year 4 16 64 256 Compiled C to GPU Reorganisation Memory Opt. Relative Performance Design-time CPU GPU 7/16/2018 12
  • 13. PROBLEMS: DESIGN EFFORT  FPGAs provide large speed-up and power savings – at a price!  Days or weeks to get an initial version working  Multiple optimisation and verification cycles to get high performance  Too risky and too specialised for most users  Months of retraining for an uncertain speed-up  Currently only used in large projects, with dedicated FPGA engineer Hour Day Week Month 0.25 1 Year 4 16 64 256 Initial Design Parallelisation Clock Rate Relative Performance Design-time CPU GPU FPGA 7/16/2018 13
  • 14. GOAL: FPGAS FOR THE MASSES  Accelerate niche applications with limited user-base  Don’t have to wait for traditional “heroic” optimisation  Single-source description  The prototype code is the final code  Encourage experimentation  Give users freedom to tweak and modify  Target platforms at multiple scales  Individual user; Research group; Enterprise  Use domain specific knowledge about applications  Identify bottlenecks: optimise them  Identify design patterns: automate them  Don’t try to do general purpose “C to hardware” 7/16/2018 14
  • 15. WHICH TECHNOLOGY?  Clear architectural trend of parallelism and heterogeneity  Heterogeneous devices have many tradeoffs  Usage cases also affect best device choice  Problem: huge design space Problem Solution CPU CPU CPU CPU CPU Problem Solution Execution time: 10 sec Execution time: 2.5 sec CPU FPGAPCI CPU GPUPCI Orders of Magnitude Improvement CPU CPU CPU GPU FPGA Clock Rate Base Complexity Base Parallelism Base Power Base ExecutionTime Task Size Sequential CPU Multicore CPU GPU FPGA Huge Design Space •Which Accelerator? •Which Brand? •Number of cores? •Which Device? •Device Cost? •Design time? •Which algorithm? •Use case optimization? 7/16/2018 15
  • 16. TYPICAL CV APPLICATIONS: SLIDING WINDOW  Contribution: thorough analysis of devices and use cases for sliding window applications  Sliding window used in many domains, including image processing and embedded CPU CPU CPU CPU CPU GPU FPGA Devices Algorithms Use Cases KernelSize Image SizeCorrentropyConvolution Sum of Absolute Differences 7/16/2018 16
  • 17. TYPICAL CV APPLICATIONS: SLIDING WINDOW APPLICATIONS  Consider a 2D Sliding Window with 16-bit grayscale image inputs  Applies window function against a window from image and the kernel  “Slides” the window to get the next input  Repeats for every possible window  45x45 kernel on 1080p 30-FPS video = 120 billion memory accesses/second Window KernelWindow 0 Window Function Output Pixel Window W-1 Window 1Input: image of size x×y, kernel of size n×m for (row=0; row < x-n; row++) { for (col=0; col < y-m; col++) { // get n*m pixels (i.e., windows // starting from current row and col) window=image[row:row+n-1][col:col+m-1] output[row][col]=f(window,kernel) } } 7/16/2018 17
  • 18. APP 1: SUM OF ABSOLUTE DIFFERENCES (SAD)  Used for: H.264 encoding, object identification  Window function: point-wise absolute difference, followed by summation Kernel W1 WindowX W2 W3 W4 k1 k2 k3 k4 W1 W2 W3 W4 k1 k2 k3 k4 d1 d2 d3 d4 a1 a2 a3 a4 Ox Output Pixel Absolute Value() Ox = 0; For pixel in window: Ox += abs(pixeli – kerneli) 7/16/2018 18
  • 19. APP 2: 2D CONVOLUTION  Used for: filtering, edge detection  Window function: point-wise product followed by summation Kernel W1 WindowX W2 W3 W4 k1 k2 k3 k4 W1 W2 W3 W4 k4 k3 k2 k1 p1 p2 p3 p4 Ox Output Pixel Ox = 0; For pixel in window: Ox += (pixeli x kerneln-i) 7/16/2018 19
  • 20. APP 3: CORRENTROPY  Used for: optical flow, obstacle avoidance  Window function: Gaussian of point-wise absolute difference, followed by summation W1 W2 W3 W4 k1 k2 k3 k4 a1 a2 a3 a4 Ox Output Pixel Absolute Difference Ox = 0; For pixel in window: Ox += Gauss(abs(pixeli – kerneli)) g1 g2 g3 g4 Gaussian Function() 7/16/2018 20
  • 21. HETEROGENEOUS COMPUTING SYSTEM IN TOP500 LIST  Reason: Significant performance/energy-efficiency boost from GPU/CPU 1 8 7 17 39 62 53 75 102 94 0 20 40 60 80 100 120 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 (June) Heterogeneous computing system in Top500 7/16/2018 21
  • 22.  Specialized accelerator for data-parallel applications  Optimized for processing massive data  Give up unrelated goal and features  Give up optimizing latency for processing single data  Give up branch prediction, out-of-order execution  Give up large traditional cache hierarchy  More resource for parallel are processing  More cores, more ALU GPU: SPECIALIZED ACCELERATOR FOR A SET OF APPLICATIONS Data DataCore Data DataCore Data DataCore Data DataCore 7/16/2018 22
  • 23.  Only provides primitive building blocks for computation  Register, addition/multiplication , memories, programmable Boolean operations and connections  Build application-specific accelerator from primitives building blocks  Interconnection between primitive functional units  Timing of data movement between primitive functional units  Opportunities for optimizations for a specific application!  Maximizing efficiency while throwing away redundancy CREATING APPLICATION-SPECIFIC ACCELERATOR WITH FPGA Virtex®-7 FPGA Precise, Low Jitter Clocking MMCMs Logic Fabric LUT-6 CLB DSP Engines DSP48E1 Slices On-Chip Memory 36Kbit/18Kbit Block RAM Enhanced Connectivity PCIe® Interface Blocks Hi-perf. Parallel I/O Connectivity SelectIO™ Technology Hi-performance Serial I//O Connectivity Transceiver Technology 7/16/2018 23
  • 24.  Require tremendous efforts  Extensive knowledge of digital circuit design  The potential of FPGAs is not easily accessible by common software engineers THE CHALLENGES OF PROMOTING FPGAS AMONG SOFTWARE ENGINEERS AXI Master Burst inference DSP48 Timing Closure Stable interface Loop rewind 247/16/2018 24
  • 25. • Zynq-7000 devices are equipped with dual-core ARM Cortex-A9 processors integrated with 28nm Artix-7 or Kintex®- 7 based programmable logic for excellent performance-per-watt and maximum design flexibility. • Sundance’s EMC2 carrier board is Compatible with all the Zynq-7000 Series. HYBRID ARQUITECTURE 7/16/2018 25
  • 26. • Zynq® UltraScale+™ MPSoC devices provide 64-bit processor scalability while combining real-time control with soft and hard engines for graphics, video, waveform, and packet processing. • Sundance’s EMC2 carrier board is compatible with all the Zynq® UltraScale+™ MPSoC. Our focus is now on the XCZU4EV device (automotive grade). WHAT IS THE BEST SOLUTION? 7/16/2018 26
  • 27. VCS-1 (EMC2) HARDWARE FEATURES Connectivity:  FM191-R; FMC-LPC to:  15x Digital I/Os [DB9]  12x Analogue Inputs [DB9]  8x Analogue Outputs [DB9]  1x Expansion [SEIC]  FM191-U; SEIC to:  4x USB3.0 [USB-c]  28x GPIO [40-pin GPIO]  FM191-A1; 40-pin GPIO  28x GPIO [DB9] 7/16/2018 27
  • 28. The ZU4EV MPSoC is compatible with a wide range of sensors. Depth sensor (up to 20m). Interface USB3.0 Thermal imaging Interface: Eth Movidius Neural Compute Stick Interface: USB3.0 VCS-1 (EMC2) SENSORS COMPATIBILITY 7/16/2018 28JAI AD-130GE Interface: Eth
  • 29. VCS-1 (EMC2) COMPATIBILITY VCS-1 features:  Raspberry PI and Arduino compatible;  Compatible with most of the Arduino/RPI sensors and actuators;  4x USB3.0 ports for interfacing with a wide range of sensors;  MQTT and OpenCV compatible  ROS compatible  ROS2 ready  HIPPEROS ready 7/16/2018 29
  • 30. The VCS-1 will be fully compatible with the Xilinx reVision stack.  Includes support for the most popular neural networks including AlexNet, GoogLeNet, VGG, SSD, and FCN.  Optimized implementations for CNN network layers, required to build custom neural networks (DNN/CNN) 7/16/2018 30 Xilinx reVISION stack DEEP LEARNING ON THE VCS-1 (EMC2)
  • 31. VCS-1 (EMC2) OPEN SOURCE SOFTWARE AND FIRMWARE Open Source Hardware/software and online documentation:  Open Hardware Repository https://www.ohwr.org/projects/emc2-dp  Microsoft Windows/Linux 64-bit SDK https://github.com/SundanceMultiprocessorTechnology/V CS-1_SDK  FM191 ARM SDK https://github.com/SundanceMultiprocessorTechnology/V CS-1_FM191_SDK  FM191 Firmware https://github.com/SundanceMultiprocessorTechnology/V CS-1_FM191_FW  EMC2 ROS  https://github.com/SundanceMultiprocessorTechnology /VCS-1_emc2_ros 7/16/2018 31
  • 32. A custom enclosure was specially designed for accommodating the VCS-1 system. 7/16/2018 32 VCS-1(EMC2) ENCLOSURE
  • 33. DISCUSSION The VCS-1 has the following characteristics: 1. High performance (24V@0.595A) 2. Low power consumption 3. Highly compatible with a wide range of commercially available sensors and actuators 4. Highly optimised for computer vision applications 5. Fully reconfigurable 7/16/2018 33
  • 34. WHAT NEXT?  Be part of our customers family by ordering our products or hiring our services.  Sundance University Program (SUP)  Access to hardware prototypes  Advisory Board Members  BSC/MSc/PhD Internships  Funding capture  H2020  InnovateUK  EPSRC Know more about SUP and on-going projects https://www.sundance.com/sundance-in-eu-projects- programs/ 7/16/2018 34
  • 37. QUESTIONS? 7/16/2018 37 Sundance Multiprocessor Technology, Ltd. Pedro Machado pedro.m@sundance.com Fatima Kishwar fatima.k@sundance.com Flemming Christensen flemming.c@sundance.com EM3V - Embedded Vision