SlideShare a Scribd company logo
Tulipp Use Cases
Implementation on embedded platforms
• All use cases started with a reference implementation in a normal
server environment
• Tools used on the embedded platform:
• SDSoc
• With the Tulipp platform installed
• Vivado HLS
• Tulipp tools:
• Stehm
• Lynsyn
• Hipperos OS
Workflow
• Clean the code from library dependancies not available on the
embedded platform
• Make it run on the CPU side of the SOC FPGA – handle input/output,
smaller memory footprint etc
• Identify sections of the code that are candidates for HW acceleration
• Refactor/restructure the algorithm to optimize for the given
conditions:
• Streaming
• Small local memory
• Preferably no floating point
Pedestrian
detection
Safety
application
Car
integration
The Use Case
Requirements:
• 30 Hz frame rate
• Low latency (2-3 frames)
• Not more than 5-10 Watt
The ADAS use case – pedestrian detection
Viola/Jones classification
• Machine learning algorithm based on training with labeled data
• The classifier is the weighted sum of ”rectangular features”
• The weights and what features to chose is selected by the training
process
Rectangular features
A classifier consists of a
large number of features
calculated for a given
path.
If the sum is above a
threshold, the patch
contains a pedestrian.
A feature is the sum of all
pixels in a rectangular
region.
Integral images
• In an integral image, each pixel stores
the sum of all pixel to the left and
above that pixel in the original image
• With an integral image, the sum of all
pixels in an arbitrary rectangle can
easily be calculated with a small, fixed
number of operations
x, y
Integral images
Frame
Orientation
Gradient
magnitude
LUV color
10 integral
images
640 x 480
Perform detections
Integral images
50
sizes
Sweep
classifier
over all
pixel
positions
The algorithm
Non-maximum suppression
Challenges
• High memory bandwidth requirements – combined with a non-
sequential access pattern
• 30 frames/s
• 50 patch sizes
• Sweping over all image positions
• Each classifier requires roughly 1000 feature calculations
• Ineffective pipelining since the classifier calculation can terminate at
any stage
• Not all data can be kept locally (cached)
So we need some tricks
• Cascading – successively trained classifier chain that emphasizes on eliminating
non-pedestrians quickly. Reduces the number of classifier steps on average with
at least a factor of 10.
• The classifiers does not need to test every single position, instead scan in a grid
• Results in a need for 5-10 Gbyte/s – random access!
Patch with
possible
pedestrian
No pedestrian No pedestrian No pedestrian
Pedestrian!
…
Random access on DDRs is very ineffective
• The trick was to find data requests that were on the same DDR cache
lines
• That required us to rewrite the algorithm so it calculates many
classifiers at the same time
• By then reordering all accesses in a cache friendly manner, the
resulting memory bandwidth increased to almost the same as for
fully sequential accesses
Result
• Reference implementation on PC platform – 10 s/frame
• Final implementation on the Tulipp platform
• 15 frames/s
• Latency of 2-3 frames
The UAV use case
The UAV Use Case objectives
• To perform real time stereo depth estimation
• To detect obstacles based on the depth estimation and to avoid
collision
• Based on dual cameras forming a stereo pair
• Lower weight and lower price than a depth camera
• Requires real time performance – high measurement rate and low
latency
StereoDepth Estimation
• Two cameras with baseline 𝑏, observe an object 𝑀 at two different locations 𝑥1 and 𝑥2
• Depth 𝑍 can be computed from disparity 𝑑 = |𝑥1 − 𝑥2| ∝
1
𝑍
• Disparity computation requires detection of same objects in both images
Stereo camera setup
Algorithm Description
• Stereo algorithm with Semi-Global-Matching [1] optimization
[1] H. Hirschmueller, Accurate and efficient stereo processing by semi-global matching and mutual information,
IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2005.
Input: Stereo Images
Output: Depth map
Image Rectification /
Pre-processing
Depth Estimation
Local matching
Semi Global Matching
Left-Right
Consistency check
Median filtering
Semi-Global Matching
𝐸 𝐷 = ෍
𝐩
𝐶𝑜𝑠𝑡 𝐩, 𝐷p + ෍
𝐪∈𝑁p
𝑃1T 𝐷p − 𝐷q = 1 + ෍
𝐪∈𝑁p
𝑃2T 𝐷p − 𝐷q > 1 , 𝑚𝑖𝑡 𝑃1 ≤ 𝑃2
Large discontinuity – large penaltySmall discontinuity – small penalty
Aggregation along paths solved using
dynamic programming
Stereo Depth Estimation
The depth z can be calculated as 𝑍 = 𝑓 ⋅
𝑏
𝑑
Where f is the focal length, B the distance between the cameras and d the disparity.
Input image Corresponding depth image
Obstacle Avoidance
Reactive obstacle avoidance algorithm computing shortest path around
obstacle based on disparity map
1. U- / V-Map computation (Oleynikova et al. 2015)
2. Binary filtering and contour detection
3. Obstacle extraction and waypoint computation
U- / V-Map Binary filtering Contour Detection
Obstacle
extraction
Waypoint
Computation
Challenges
• Limited local storage on the FPGA
• Real Time/Low latency requirements
SGM optimization for streaming
• Original algorithm used aggregation
along 8 paths
• That requires access to the full image
• In the FPGA implementation, the full
image can’t be stored locally, hence a
streaming solution would be preferred
• By only aggregating along 4 paths,
streaming can be used.
• Only 1.7% accuracy reduction when
going from 8 to 4 paths
Implementation and Results
The disparity estimation is
implemented in C/C++ and
synthesized to the FPGA using
HLS
The obstacle avoidance is
purely implemented on CPU
part of the SOC FPGA
The medical use case
• Used on X-ray video for surgery
• Lower the radiation dose by a factor of 4
• Enhances the image quality by denoising and image filtering
• Operates on 1024x1024 24 bits images @ 30 Hz
Current solution vs the goal
RAW IMAGE
PC
dedicated
to Thales
Sensor
Cleaned &
Enhanced
Image
UI
Current Xray Sensor architecture
With Tulipp
- Reduce Costs
- Reduce Size
- Ease integration
- Choose a MPSoC
GigE-Vision+Msg
Nano Processing Unit
Inside the sensor
Based on SoC
(credit card size board)
Future Xray Sensor
architecture
Cleaned &
Enhanced
ImageGigE-Vision+Msg
Multi pass image filtering
• The image is filtered with several different methods
• Together they perform:
• Remove sensor defects
• Emphasize low contrast parts of the image
• Enhances details and edges
• Adapt the image to the final display
Typical processing sequence:
Raw Image
Typical processing sequence:
Clean image stage
• Remove dead pixels
• AGC – Automatic Gain
Control
• ABC – Automatic Brightness
and Contrast – feedback to x-
ray sensor
Typical processing sequence:
Pre-equalization gamma
• Enhancing low level parts of
the image
• Recursive, temporal filter for
denoising
Typical processing sequence:
Clip & Spatial filters
• Clipping to reduce the signal
levels of the very bright areas
• Spatial filtering for smoothing
(convolution)
Typical processing sequence:
Multiscale contrast & edge enhancement
• Multiscale filtering using
Laplacian Gaussian pyramid
• Iteratively operates on
downscaled images in a
”pyramid”
• A low pass filterede image is
subtracted from the original
in each step, to extract the
high frequency components
• Final image is composed of
the result from each level
Typical processing sequence:
Inversion & auto-Adaptative LUT
• Adaptation to the display
Typical processing sequence:
Rotation & Resize
Challenges
• Handling of all scales in the pyramid filtering – requires much more
memory than locally available
• Some of the filters had to be redesign since they had too many
branches, which is poor for hardware streaming solutions
• Implemented from C/C++ using SDSoc
Results
• The algorithm, although slightly modified, run s on the Tulipp
platform:
• 29 frames/s
• 29 ms latency
Conclusion
• The three use cases show that the Tulipp platform performs well for
quite different applications
• The Tulipp tools together with the vendor tools offers a nice
development environment, where you actually can get effective FPGA
implementations using high level tools, based on C/C++
• Important to remember – a large portion of the work will (always) be
to refactor/restructure the algorithm to fit the underlaying hardware
structure

More Related Content

Similar to HiPEAC 2019 Workshop - Use Cases

Terrain Rendering using GPU-Based Geometry Clipmaps
Terrain Rendering usingGPU-Based Geometry ClipmapsTerrain Rendering usingGPU-Based Geometry Clipmaps
Terrain Rendering using GPU-Based Geometry Clipmaps
none299359
 
Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...
Sara Granados Cabeza
 
Kitchen Occupation Project Presentation
Kitchen Occupation Project PresentationKitchen Occupation Project Presentation
Kitchen Occupation Project Presentation
MattiasTiger
 
Application of the Actor Model to Large Scale NDE Data Analysis
Application of the Actor Model to Large Scale NDE Data AnalysisApplication of the Actor Model to Large Scale NDE Data Analysis
Application of the Actor Model to Large Scale NDE Data Analysis
ChrisCoughlin9
 
Track 4 session 3 - st dev con 2016 - pedestrian dead reckoning
Track 4   session 3 - st dev con 2016 - pedestrian dead reckoningTrack 4   session 3 - st dev con 2016 - pedestrian dead reckoning
Track 4 session 3 - st dev con 2016 - pedestrian dead reckoning
ST_World
 
Online video object segmentation via convolutional trident network
Online video object segmentation via convolutional trident networkOnline video object segmentation via convolutional trident network
Online video object segmentation via convolutional trident network
NAVER Engineering
 
IMAGE PROCESSING
IMAGE PROCESSINGIMAGE PROCESSING
IMAGE PROCESSING
ABHISHEK MAURYA
 
DigitRecognition.pptx
DigitRecognition.pptxDigitRecognition.pptx
DigitRecognition.pptx
ruvex
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptx
AkshitAgiwal1
 
Dasia 2022
Dasia 2022Dasia 2022
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
DonghyunKang12
 
Multiple UGV SLAM Map Sharing
Multiple UGV SLAM Map SharingMultiple UGV SLAM Map Sharing
Multiple UGV SLAM Map Sharing
Akash Borate
 
A framework for low communication approaches for large scale 3D convolution
A framework for low communication approaches for large scale 3D convolutionA framework for low communication approaches for large scale 3D convolution
A framework for low communication approaches for large scale 3D convolution
Carlos Reaño González
 
Approximation techniques used for general purpose algorithms
Approximation techniques used for general purpose algorithmsApproximation techniques used for general purpose algorithms
Approximation techniques used for general purpose algorithms
Sabidur Rahman
 
Hpg2011 papers kazakov
Hpg2011 papers kazakovHpg2011 papers kazakov
Hpg2011 papers kazakov
mistercteam
 
QUIN 4.0 - Smart Drone - Final Presentation
QUIN 4.0 - Smart Drone - Final PresentationQUIN 4.0 - Smart Drone - Final Presentation
QUIN 4.0 - Smart Drone - Final Presentation
Ali Ghani Syed
 
SFScon21 - Alex Bojeri - Artificial Intelligence Algorithms for Automatic Seg...
SFScon21 - Alex Bojeri - Artificial Intelligence Algorithms for Automatic Seg...SFScon21 - Alex Bojeri - Artificial Intelligence Algorithms for Automatic Seg...
SFScon21 - Alex Bojeri - Artificial Intelligence Algorithms for Automatic Seg...
South Tyrol Free Software Conference
 
A modern Post-Processing Pipeline
A modern Post-Processing PipelineA modern Post-Processing Pipeline
A modern Post-Processing Pipeline
Wolfgang Engel
 
A new Post-Processing Pipeline
A new Post-Processing PipelineA new Post-Processing Pipeline
A new Post-Processing Pipeline
Wolfgang Engel
 

Similar to HiPEAC 2019 Workshop - Use Cases (20)

Terrain Rendering using GPU-Based Geometry Clipmaps
Terrain Rendering usingGPU-Based Geometry ClipmapsTerrain Rendering usingGPU-Based Geometry Clipmaps
Terrain Rendering using GPU-Based Geometry Clipmaps
 
Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...
 
Kitchen Occupation Project Presentation
Kitchen Occupation Project PresentationKitchen Occupation Project Presentation
Kitchen Occupation Project Presentation
 
Application of the Actor Model to Large Scale NDE Data Analysis
Application of the Actor Model to Large Scale NDE Data AnalysisApplication of the Actor Model to Large Scale NDE Data Analysis
Application of the Actor Model to Large Scale NDE Data Analysis
 
Track 4 session 3 - st dev con 2016 - pedestrian dead reckoning
Track 4   session 3 - st dev con 2016 - pedestrian dead reckoningTrack 4   session 3 - st dev con 2016 - pedestrian dead reckoning
Track 4 session 3 - st dev con 2016 - pedestrian dead reckoning
 
Online video object segmentation via convolutional trident network
Online video object segmentation via convolutional trident networkOnline video object segmentation via convolutional trident network
Online video object segmentation via convolutional trident network
 
IMAGE PROCESSING
IMAGE PROCESSINGIMAGE PROCESSING
IMAGE PROCESSING
 
DigitRecognition.pptx
DigitRecognition.pptxDigitRecognition.pptx
DigitRecognition.pptx
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptx
 
Connected Components Labeling
Connected Components LabelingConnected Components Labeling
Connected Components Labeling
 
Dasia 2022
Dasia 2022Dasia 2022
Dasia 2022
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
Multiple UGV SLAM Map Sharing
Multiple UGV SLAM Map SharingMultiple UGV SLAM Map Sharing
Multiple UGV SLAM Map Sharing
 
A framework for low communication approaches for large scale 3D convolution
A framework for low communication approaches for large scale 3D convolutionA framework for low communication approaches for large scale 3D convolution
A framework for low communication approaches for large scale 3D convolution
 
Approximation techniques used for general purpose algorithms
Approximation techniques used for general purpose algorithmsApproximation techniques used for general purpose algorithms
Approximation techniques used for general purpose algorithms
 
Hpg2011 papers kazakov
Hpg2011 papers kazakovHpg2011 papers kazakov
Hpg2011 papers kazakov
 
QUIN 4.0 - Smart Drone - Final Presentation
QUIN 4.0 - Smart Drone - Final PresentationQUIN 4.0 - Smart Drone - Final Presentation
QUIN 4.0 - Smart Drone - Final Presentation
 
SFScon21 - Alex Bojeri - Artificial Intelligence Algorithms for Automatic Seg...
SFScon21 - Alex Bojeri - Artificial Intelligence Algorithms for Automatic Seg...SFScon21 - Alex Bojeri - Artificial Intelligence Algorithms for Automatic Seg...
SFScon21 - Alex Bojeri - Artificial Intelligence Algorithms for Automatic Seg...
 
A modern Post-Processing Pipeline
A modern Post-Processing PipelineA modern Post-Processing Pipeline
A modern Post-Processing Pipeline
 
A new Post-Processing Pipeline
A new Post-Processing PipelineA new Post-Processing Pipeline
A new Post-Processing Pipeline
 

More from Tulipp. Eu

What are TULIPP starter kits?
What are TULIPP starter kits?What are TULIPP starter kits?
What are TULIPP starter kits?
Tulipp. Eu
 
Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...
Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...
Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...
Tulipp. Eu
 
HIPPEROS's at EMVA 2017
HIPPEROS's at EMVA 2017 HIPPEROS's at EMVA 2017
HIPPEROS's at EMVA 2017
Tulipp. Eu
 
TULIPP overview
TULIPP overviewTULIPP overview
TULIPP overview
Tulipp. Eu
 
HiPEAC 2019 Workshop - Vision Processing
HiPEAC 2019 Workshop - Vision ProcessingHiPEAC 2019 Workshop - Vision Processing
HiPEAC 2019 Workshop - Vision Processing
Tulipp. Eu
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
Tulipp. Eu
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOS
Tulipp. Eu
 
HiPEAC 2019 Tutorial - Image Processing Library:HiFlipVX
HiPEAC 2019 Tutorial - Image Processing Library:HiFlipVXHiPEAC 2019 Tutorial - Image Processing Library:HiFlipVX
HiPEAC 2019 Tutorial - Image Processing Library:HiFlipVX
Tulipp. Eu
 
HiPEAC 2019 Tutorial - Sthem overview
HiPEAC 2019 Tutorial - Sthem overviewHiPEAC 2019 Tutorial - Sthem overview
HiPEAC 2019 Tutorial - Sthem overview
Tulipp. Eu
 
HiPEAC 2019 Workshop - Hardware Starter Kit Agri
HiPEAC 2019 Workshop - Hardware Starter Kit Agri HiPEAC 2019 Workshop - Hardware Starter Kit Agri
HiPEAC 2019 Workshop - Hardware Starter Kit Agri
Tulipp. Eu
 
HiPEAC 2019 Workshop Overview
HiPEAC 2019 Workshop OverviewHiPEAC 2019 Workshop Overview
HiPEAC 2019 Workshop Overview
Tulipp. Eu
 
Tulipp starter-kit-agri
Tulipp starter-kit-agriTulipp starter-kit-agri
Tulipp starter-kit-agri
Tulipp. Eu
 
TULIPP H2020 Project: Low power high performance real-time computer vision on...
TULIPP H2020 Project: Low power high performance real-time computer vision on...TULIPP H2020 Project: Low power high performance real-time computer vision on...
TULIPP H2020 Project: Low power high performance real-time computer vision on...
Tulipp. Eu
 
TULIPP H2020 Project presentation @ FPGA Network: Implementing Machine Vision...
TULIPP H2020 Project presentation @ FPGA Network: Implementing Machine Vision...TULIPP H2020 Project presentation @ FPGA Network: Implementing Machine Vision...
TULIPP H2020 Project presentation @ FPGA Network: Implementing Machine Vision...
Tulipp. Eu
 
Tulipp_H2020_Hipeac'17 Conference_PEPGUM Workshop_January 017
Tulipp_H2020_Hipeac'17 Conference_PEPGUM Workshop_January 017Tulipp_H2020_Hipeac'17 Conference_PEPGUM Workshop_January 017
Tulipp_H2020_Hipeac'17 Conference_PEPGUM Workshop_January 017
Tulipp. Eu
 
D1.1 reference platform_v1_20161215
D1.1 reference platform_v1_20161215D1.1 reference platform_v1_20161215
D1.1 reference platform_v1_20161215
Tulipp. Eu
 
Samos July 2016_tulipp-H2020 project presentation
Samos July 2016_tulipp-H2020 project presentationSamos July 2016_tulipp-H2020 project presentation
Samos July 2016_tulipp-H2020 project presentation
Tulipp. Eu
 
Tulipp collaboration Workshop - Advanced Computing and CPS - June 2016
Tulipp collaboration Workshop - Advanced Computing and CPS - June 2016Tulipp collaboration Workshop - Advanced Computing and CPS - June 2016
Tulipp collaboration Workshop - Advanced Computing and CPS - June 2016
Tulipp. Eu
 

More from Tulipp. Eu (18)

What are TULIPP starter kits?
What are TULIPP starter kits?What are TULIPP starter kits?
What are TULIPP starter kits?
 
Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...
Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...
Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...
 
HIPPEROS's at EMVA 2017
HIPPEROS's at EMVA 2017 HIPPEROS's at EMVA 2017
HIPPEROS's at EMVA 2017
 
TULIPP overview
TULIPP overviewTULIPP overview
TULIPP overview
 
HiPEAC 2019 Workshop - Vision Processing
HiPEAC 2019 Workshop - Vision ProcessingHiPEAC 2019 Workshop - Vision Processing
HiPEAC 2019 Workshop - Vision Processing
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOS
 
HiPEAC 2019 Tutorial - Image Processing Library:HiFlipVX
HiPEAC 2019 Tutorial - Image Processing Library:HiFlipVXHiPEAC 2019 Tutorial - Image Processing Library:HiFlipVX
HiPEAC 2019 Tutorial - Image Processing Library:HiFlipVX
 
HiPEAC 2019 Tutorial - Sthem overview
HiPEAC 2019 Tutorial - Sthem overviewHiPEAC 2019 Tutorial - Sthem overview
HiPEAC 2019 Tutorial - Sthem overview
 
HiPEAC 2019 Workshop - Hardware Starter Kit Agri
HiPEAC 2019 Workshop - Hardware Starter Kit Agri HiPEAC 2019 Workshop - Hardware Starter Kit Agri
HiPEAC 2019 Workshop - Hardware Starter Kit Agri
 
HiPEAC 2019 Workshop Overview
HiPEAC 2019 Workshop OverviewHiPEAC 2019 Workshop Overview
HiPEAC 2019 Workshop Overview
 
Tulipp starter-kit-agri
Tulipp starter-kit-agriTulipp starter-kit-agri
Tulipp starter-kit-agri
 
TULIPP H2020 Project: Low power high performance real-time computer vision on...
TULIPP H2020 Project: Low power high performance real-time computer vision on...TULIPP H2020 Project: Low power high performance real-time computer vision on...
TULIPP H2020 Project: Low power high performance real-time computer vision on...
 
TULIPP H2020 Project presentation @ FPGA Network: Implementing Machine Vision...
TULIPP H2020 Project presentation @ FPGA Network: Implementing Machine Vision...TULIPP H2020 Project presentation @ FPGA Network: Implementing Machine Vision...
TULIPP H2020 Project presentation @ FPGA Network: Implementing Machine Vision...
 
Tulipp_H2020_Hipeac'17 Conference_PEPGUM Workshop_January 017
Tulipp_H2020_Hipeac'17 Conference_PEPGUM Workshop_January 017Tulipp_H2020_Hipeac'17 Conference_PEPGUM Workshop_January 017
Tulipp_H2020_Hipeac'17 Conference_PEPGUM Workshop_January 017
 
D1.1 reference platform_v1_20161215
D1.1 reference platform_v1_20161215D1.1 reference platform_v1_20161215
D1.1 reference platform_v1_20161215
 
Samos July 2016_tulipp-H2020 project presentation
Samos July 2016_tulipp-H2020 project presentationSamos July 2016_tulipp-H2020 project presentation
Samos July 2016_tulipp-H2020 project presentation
 
Tulipp collaboration Workshop - Advanced Computing and CPS - June 2016
Tulipp collaboration Workshop - Advanced Computing and CPS - June 2016Tulipp collaboration Workshop - Advanced Computing and CPS - June 2016
Tulipp collaboration Workshop - Advanced Computing and CPS - June 2016
 

Recently uploaded

一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
kywwoyk
 
Drugs used in parkinsonism and other movement disorders.pptx
Drugs used in parkinsonism and other movement disorders.pptxDrugs used in parkinsonism and other movement disorders.pptx
Drugs used in parkinsonism and other movement disorders.pptx
ThalapathyVijay15
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
eemet
 
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
Amil baba
 
Cyber Sequrity.pptx is life of cyber security
Cyber Sequrity.pptx is life of cyber securityCyber Sequrity.pptx is life of cyber security
Cyber Sequrity.pptx is life of cyber security
perweeng31
 
web-tech-lab-manual-final-abhas.pdf. Jer
web-tech-lab-manual-final-abhas.pdf. Jerweb-tech-lab-manual-final-abhas.pdf. Jer
web-tech-lab-manual-final-abhas.pdf. Jer
freshgammer09
 
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
PinkySharma900491
 
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
kywwoyk
 
F5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptxF5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptx
ArjunJain44
 

Recently uploaded (9)

一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
 
Drugs used in parkinsonism and other movement disorders.pptx
Drugs used in parkinsonism and other movement disorders.pptxDrugs used in parkinsonism and other movement disorders.pptx
Drugs used in parkinsonism and other movement disorders.pptx
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
 
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
 
Cyber Sequrity.pptx is life of cyber security
Cyber Sequrity.pptx is life of cyber securityCyber Sequrity.pptx is life of cyber security
Cyber Sequrity.pptx is life of cyber security
 
web-tech-lab-manual-final-abhas.pdf. Jer
web-tech-lab-manual-final-abhas.pdf. Jerweb-tech-lab-manual-final-abhas.pdf. Jer
web-tech-lab-manual-final-abhas.pdf. Jer
 
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
 
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
 
F5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptxF5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptx
 

HiPEAC 2019 Workshop - Use Cases

  • 2. Implementation on embedded platforms • All use cases started with a reference implementation in a normal server environment • Tools used on the embedded platform: • SDSoc • With the Tulipp platform installed • Vivado HLS • Tulipp tools: • Stehm • Lynsyn • Hipperos OS
  • 3. Workflow • Clean the code from library dependancies not available on the embedded platform • Make it run on the CPU side of the SOC FPGA – handle input/output, smaller memory footprint etc • Identify sections of the code that are candidates for HW acceleration • Refactor/restructure the algorithm to optimize for the given conditions: • Streaming • Small local memory • Preferably no floating point
  • 4. Pedestrian detection Safety application Car integration The Use Case Requirements: • 30 Hz frame rate • Low latency (2-3 frames) • Not more than 5-10 Watt The ADAS use case – pedestrian detection
  • 5. Viola/Jones classification • Machine learning algorithm based on training with labeled data • The classifier is the weighted sum of ”rectangular features” • The weights and what features to chose is selected by the training process
  • 6. Rectangular features A classifier consists of a large number of features calculated for a given path. If the sum is above a threshold, the patch contains a pedestrian. A feature is the sum of all pixels in a rectangular region.
  • 7. Integral images • In an integral image, each pixel stores the sum of all pixel to the left and above that pixel in the original image • With an integral image, the sum of all pixels in an arbitrary rectangle can easily be calculated with a small, fixed number of operations x, y
  • 12. Challenges • High memory bandwidth requirements – combined with a non- sequential access pattern • 30 frames/s • 50 patch sizes • Sweping over all image positions • Each classifier requires roughly 1000 feature calculations • Ineffective pipelining since the classifier calculation can terminate at any stage • Not all data can be kept locally (cached)
  • 13. So we need some tricks • Cascading – successively trained classifier chain that emphasizes on eliminating non-pedestrians quickly. Reduces the number of classifier steps on average with at least a factor of 10. • The classifiers does not need to test every single position, instead scan in a grid • Results in a need for 5-10 Gbyte/s – random access! Patch with possible pedestrian No pedestrian No pedestrian No pedestrian Pedestrian! …
  • 14. Random access on DDRs is very ineffective • The trick was to find data requests that were on the same DDR cache lines • That required us to rewrite the algorithm so it calculates many classifiers at the same time • By then reordering all accesses in a cache friendly manner, the resulting memory bandwidth increased to almost the same as for fully sequential accesses
  • 15. Result • Reference implementation on PC platform – 10 s/frame • Final implementation on the Tulipp platform • 15 frames/s • Latency of 2-3 frames
  • 16. The UAV use case
  • 17. The UAV Use Case objectives • To perform real time stereo depth estimation • To detect obstacles based on the depth estimation and to avoid collision • Based on dual cameras forming a stereo pair • Lower weight and lower price than a depth camera • Requires real time performance – high measurement rate and low latency
  • 18. StereoDepth Estimation • Two cameras with baseline 𝑏, observe an object 𝑀 at two different locations 𝑥1 and 𝑥2 • Depth 𝑍 can be computed from disparity 𝑑 = |𝑥1 − 𝑥2| ∝ 1 𝑍 • Disparity computation requires detection of same objects in both images Stereo camera setup
  • 19. Algorithm Description • Stereo algorithm with Semi-Global-Matching [1] optimization [1] H. Hirschmueller, Accurate and efficient stereo processing by semi-global matching and mutual information, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2005. Input: Stereo Images Output: Depth map Image Rectification / Pre-processing Depth Estimation Local matching Semi Global Matching Left-Right Consistency check Median filtering
  • 20. Semi-Global Matching 𝐸 𝐷 = ෍ 𝐩 𝐶𝑜𝑠𝑡 𝐩, 𝐷p + ෍ 𝐪∈𝑁p 𝑃1T 𝐷p − 𝐷q = 1 + ෍ 𝐪∈𝑁p 𝑃2T 𝐷p − 𝐷q > 1 , 𝑚𝑖𝑡 𝑃1 ≤ 𝑃2 Large discontinuity – large penaltySmall discontinuity – small penalty Aggregation along paths solved using dynamic programming
  • 21. Stereo Depth Estimation The depth z can be calculated as 𝑍 = 𝑓 ⋅ 𝑏 𝑑 Where f is the focal length, B the distance between the cameras and d the disparity. Input image Corresponding depth image
  • 22. Obstacle Avoidance Reactive obstacle avoidance algorithm computing shortest path around obstacle based on disparity map 1. U- / V-Map computation (Oleynikova et al. 2015) 2. Binary filtering and contour detection 3. Obstacle extraction and waypoint computation U- / V-Map Binary filtering Contour Detection Obstacle extraction Waypoint Computation
  • 23. Challenges • Limited local storage on the FPGA • Real Time/Low latency requirements
  • 24. SGM optimization for streaming • Original algorithm used aggregation along 8 paths • That requires access to the full image • In the FPGA implementation, the full image can’t be stored locally, hence a streaming solution would be preferred • By only aggregating along 4 paths, streaming can be used. • Only 1.7% accuracy reduction when going from 8 to 4 paths
  • 25. Implementation and Results The disparity estimation is implemented in C/C++ and synthesized to the FPGA using HLS The obstacle avoidance is purely implemented on CPU part of the SOC FPGA
  • 26. The medical use case • Used on X-ray video for surgery • Lower the radiation dose by a factor of 4 • Enhances the image quality by denoising and image filtering • Operates on 1024x1024 24 bits images @ 30 Hz
  • 27. Current solution vs the goal RAW IMAGE PC dedicated to Thales Sensor Cleaned & Enhanced Image UI Current Xray Sensor architecture With Tulipp - Reduce Costs - Reduce Size - Ease integration - Choose a MPSoC GigE-Vision+Msg Nano Processing Unit Inside the sensor Based on SoC (credit card size board) Future Xray Sensor architecture Cleaned & Enhanced ImageGigE-Vision+Msg
  • 28. Multi pass image filtering • The image is filtered with several different methods • Together they perform: • Remove sensor defects • Emphasize low contrast parts of the image • Enhances details and edges • Adapt the image to the final display
  • 30. Typical processing sequence: Clean image stage • Remove dead pixels • AGC – Automatic Gain Control • ABC – Automatic Brightness and Contrast – feedback to x- ray sensor
  • 31. Typical processing sequence: Pre-equalization gamma • Enhancing low level parts of the image • Recursive, temporal filter for denoising
  • 32. Typical processing sequence: Clip & Spatial filters • Clipping to reduce the signal levels of the very bright areas • Spatial filtering for smoothing (convolution)
  • 33. Typical processing sequence: Multiscale contrast & edge enhancement • Multiscale filtering using Laplacian Gaussian pyramid • Iteratively operates on downscaled images in a ”pyramid” • A low pass filterede image is subtracted from the original in each step, to extract the high frequency components • Final image is composed of the result from each level
  • 34. Typical processing sequence: Inversion & auto-Adaptative LUT • Adaptation to the display
  • 36. Challenges • Handling of all scales in the pyramid filtering – requires much more memory than locally available • Some of the filters had to be redesign since they had too many branches, which is poor for hardware streaming solutions • Implemented from C/C++ using SDSoc
  • 37. Results • The algorithm, although slightly modified, run s on the Tulipp platform: • 29 frames/s • 29 ms latency
  • 38. Conclusion • The three use cases show that the Tulipp platform performs well for quite different applications • The Tulipp tools together with the vendor tools offers a nice development environment, where you actually can get effective FPGA implementations using high level tools, based on C/C++ • Important to remember – a large portion of the work will (always) be to refactor/restructure the algorithm to fit the underlaying hardware structure