SlideShare a Scribd company logo
1 of 50
Download to read offline
www.oerc.ox.ac.uk
AstroAccelerate
GPU accelerated signal processing on the path to the
Square Kilometre Array
Wes Armour, Karel Adamek,
Sofia Dimoudi, Jan Novotny, Nassim Ouannough, Cees Carels
Oxford e-Research Centre,
Department of Engineering Science
University of Oxford
20th March 2019
Part One
A brief introduction to
What is SKA?
What does SKA stand for?
Square Kilometre Array, so called because it
will have an effective collecting area of a
square kilometre.
What is SKA?
SKA is a ground based radio telescope that
will span continents.
Where will SKA be located?
SKA will be built in South Africa and
Australia.
Core
Station
Graphic courtesy of Anne Trefethen
Example of
proposed SKA
configuration
SKA science
SKA will study a wide range of science cases
and aims to answer some of the fundamental
questions mankind has about the universe we
live in.
• How do galaxies evolve
– What is dark energy?
• Tests of General Relativity
– Was Einstein correct?
• Probing the cosmic dawn
– How did stars form?
• The cradle of life
– Are we alone in the Universe?
Part Two
Time domain science
https://commons.wikimedia.org/wiki/File:Planets_and_sun_size_comparison.jpg (Author: Lsmpascal)
Sun
Pulsars – size and scale
Earth
Pulsars are magnetized, rotating neutron
stars which emit synchrotron radiation from
their poles (Crab Nebula). They are typically
1-3 Solar masses in size, have a diameter of
10-20 Kilometres and a pulse period
ranging from milliseconds to seconds.
Their magnetic field is offset from the axis
of rotation so we observe them as cosmic
lighthouses.
Pulsar
Amherst College
Hesteret.al.
Credit: FRB110220 Dan Thornton (Manchester)
SKA time domain science - Fast Radio Bursts
Fast Radio Bursts (FRBs), were first
discovered in 2005 by Lorimer et al.
They are observed as extremely
bright single pulses that are
extremely dispersed (meaning that
they are likely to be far away, maybe
extra galactic).
So far around 15 have been observed
in survey data. They are of unknown
origin, but likely to represent some of
the most extreme physics in our
Universe.
Hence they are extremely interesting
objects to study.Time
Frequency
Part Three
Computing challenges
SKA time domain - data rates
The SKA will produce vast amounts of data. In the
case of time-domain science we expect the
telescope to be able to place ~2000 observing
beams on the sky at any one time (there are
trivially parallel to compute).
The telescope will take 20,000 samples per
second for each of those beams and then it will
measure power in 4096 frequency channels for
each time sample. Each of those individual
samples will comprise of 4x8 bits, although we
are only really interested in one of the 8 bits of
information.
Doing the math tells us that we will need to
process 160GB/s of relevant data. This is
approximately equal to analysing 50 hours of HD
television data per second.
The most costly computational operations
in data processing pipeline are
DDTR ~ O(ndms * nbeams * nsamps * nchans )
FDAS ~ O(ndms * nbeams * nsamps * nacc * log(nsamps) * 1/tobs )
Requiring ~2 PetaFLOP of Compute!
SKA time domain - signal processing
The time domain team is an
international team led by Oxford
and Manchester.
It aims to deliver an end-to-end
signal processing pipeline for
time domain science performed
by SKA (see right).
Our work at OeRC has
focussed on vertical prototyping
activities. We are interested in
using many-core technologies,
such as GPUs to perform the
processing steps within the
signal processing pipeline with
the aim of achieving real-time
processing for the SKA. Image courtesy of Aris Karastergiou
Time Domain Team
Search for periodic signals
search for fast radio bursts
Part Four
GPU accelerated signal processing library
for time-domain radio astronomy.
AstroAccelerate
AstroAccelerate is a GPU enabled
software package that focuses on
achieving real-time processing of
time-domain radio-astronomy data. It
uses the CUDA programming
language for NVIDIA GPUs.
The massive computational power of
modern day GPUs allows the code to
perform algorithms such as de-
dispersion, single pulse searching and
Fourier Domain Acceleration
Searching in real-time on very large
data-sets which are comparable to
those which will be produced by next
generation radio-telescopes such as
the SKA. https://github.com/AstroAccelerateOrg/astro-accelerate
AstroAccelerate - Signal Processing
De-dispersion
Periodicity Search
Harmonic Sum
(Deep dive two)
Fourier Domain Acceleration search
Single Pulse Search
(Deep dive one)
Radio Frequency Interference Mitigation
AstroAccelerate - API
• API follows a simple pattern: configure, bind, run.
• Select which pipeline modules to run, configure module plan, then bind plan to the API.
• API calculates the strategy with the optimal configuration for the plan.
• When all strategy objects are ready, the user selected modules are run within a pipeline.
Select
pipeline
modules Configure
module
plans
bind plan to
API
API
calculates
optimal
strategy
Run
pipeline
Bind input
data to API
C++
/Python
Cees Carels
AstroAccelerate - Code Features
• Usable as a library (.so) and/or standalone executable.
• Examples with instructions on how to compile and link.
• Regular releases (semantic versioning).
• CMake build system.
• Full doxygen documentation and readme.
• Automated CI, unit tests.
Cees Carels
Part Five
Deep dive into recent
work
www.oerc.ox.ac.uk
Karel Adámek, Wes Armour
Single Pulse Detection
Single Pulse Search
Single pulse search (SPS) could be done through matched filters these
are very sensitive but has problem with “quickly”.
Using a Boxcar filter for the single pulse search (SPS):
• Allows us to reuse data
• Independent of pulse shape
• We can trade sensitivity for performance
• Less sensitive by design
Aim is to detect pulses of different shapes and widths at unknown position
within the signal and do it quickly.
Single Pulse Search: How to detect pulses with boxcars
Position of the boxcar is important
We quantify coverage of the pulses by the
distance between boxcar filters L.
• Pulse may end up between boxcars
• By decreasing L we cover pulses better
SNR is
• Increased by adding signal
• Decreased by adding noise
Signal’s strength is measured as signal-to-noise ratio (SNR)
𝑆𝑁𝑅 =
𝑥 − 𝜇
𝜎
,
Where 𝑥 is the sample value, 𝜇 is the mean and 𝜎 is the standard deviation.
Single Pulse Search: How to detect pulses with boxcars
Boxcar which is:
• too short does not cover pulse fully
• too long does add unnecessary noise
We need different boxcar widths W to
better detect different pulse widths.
SNR is
• Increased by adding signal
• Decreased by adding noise
Width of the boxcar filter is also
important
Single Pulse Search: What do we need to do?
Summary:
• Position of the boxcar relative to the
pulse is important. This is expressed by
the distance between boxcars L.
• Boxcar width W is important for
detection of pulses with different
widths.
For ideal detection we need to do:
at every point
Output:
Highest SNR detected at given sample.
• We do not need to keep values of all
boxcar filters just highest SNR!
Single Pulse Search: Two algorithms
BoxDIT
• Starts from ideal Boxcar filter
• Top-down – starts with good
sensitivity but poor performance
• Easily adjustable
• Can be very sensitive
• Not as fast
IGrid
• Start from decimation in time (DIT)
• Bottom-up – starts with good performance
but poor sensitivity
• Less flexible
• Faster
The algorithm must be able to
• perform very long boxcar filters; for
SKA this is 8000+ samples
• Adjustable sensitivity
How to adjust sensitivity
… and increase performance:
• By decreasing/increasing distance
between boxcars L
• By performing more/less boxcars of
different widths W
• After some point it is pointless to
decrease L without more widths W
Single Pulse Search: BoxDIT
BoxDIT has two steps:
• Decimation in time - is used to control
sensitivity
• Ideal boxcar filter (Scan) – is
calculating boxcar filters.
BoxDIT is reusing previously (time)
decimated data to build longer boxcar
widths.
In GPU implementation both steps are
performed at once and kernel calculates
boxcar filters as well as decimation for
next iteration.
BOTTOM: Using combinations of data at
different decimation levels allows us to
construct longer width boxcars.
Diagram of the BoxDIT algorithm.
Single Pulse Search: BoxDIT Scan at every point
Algorithm for scan at every point (applying set of boxcar filters) first calculate small scan
at every point (here 4). The value of the longest boxcar (here 4) is stored into shared
memory.
Stored in registers
Stored into shared memory as well
Single Pulse Search: BoxDIT scanpart
Showing algorithm steps only for every 4th
thread. Other threads doing the same thing
for other points.
Each thread keeps values of boxcar filters in
registers. These are increased with every
step of the algorithm.
In each step i, an active thread A, calculates
a source thread id as Si=A-i*4. The value of
the longest boxcar calculated at beginning
is loaded from source thread (from shared
memory) and used to calculate longer
boxcars in the active thread. These are kept
by the active thread.
The highest SNR from the newly calculated
boxcars is then compared with the SNR of
the source thread and stored at it’s position
in shared memory if higher.
Single Pulse Search: BoxDIT performance
When calculating 32 boxcar per iteration
(1% signal loss, idealised case) code is
limited by compute with 63% device
memory bandwidth utilisation. It is 83x
faster then real-time.
When calculating 16 boxcars per
iteration (2% signal loss, idealised) code
is limited by device memory bandwidth
(86%). It is 170x faster then real time.
Other versions of BoxDIT algorithm
Single Pulse Search: Two algorithms
BoxDIT
• Starts from ideal Boxcar filter
• Top-down – starts with good
sensitivity but poor
performance
• Easily adjustable
• Not as fast
IGrid
• Start from decimation in time (DIT)
• Bottom-up – starts with good
performance but poor sensitivity
• Less flexible
• Faster
Single Pulse Search: IGrid
The IGrid algorithm is based on combination of different decimations in time, which are
shifted in time samples.
Single Pulse Search: IGRID
This algorithm could be interpreted
also as a binary tree where leaves
indicate a shift in number of time
samples.
The binary tree view suggest what
could be calculated locally (red area).
Therefore the only thing shared
through iterations is the time-series
with zero shift.
It also offers a way how to calculate
different boxcar widths, that is by going
to the root of the tree.
Single Pulse Search: IGRID
To calculate individual IGRID iterations
is inefficient and very demanding on
device memory bandwidth.
Thus we calculate multiple IGRID
iterations per thread-block.
Points represent layers that
must be calculated to get desired
sensitivity
must be calculated but it will be
recalculated by the next block
recalculated layer
Is calculated but it is not required
TOP: to calculate individual IGRID
iterations a lot of layers had to be shared
BOTTOM: Each thread-block calculates
multiple iterations.
Single Pulse Search: BoxDIT performance
IGrid 1 – Signal loss ~6%
IGrid 2 – Signal loss ~4.5%
IGrid 3 – Signal loss ~2%
Single Pulse Search: Results
Number of DM
trials/second for SKA-mid
sized data is about ~6000.
This means BOXDIT is
~83x-200x faster then real
time
IGRID is ~150x-580x
faster then real time.
Left: Comparison of
algorithms on average
signal loss and
performance (DM trials).
Conclusions
• Quantified source of sensitivity loss
• Adjustable sensitivity
• Two algorithms with different sensitivity/performance
ratio
• BoxDIT algorithm is in AstroAccelerate and used for
science output
www.oerc.ox.ac.uk
Karel Adámek, Jan Novotný, Wes Armour
Harmonic Sum
Harmonic sum
When searching for pulsars
using Fourier domain
methods the power of the
pulsar is in frequency
domain spread to multiple
frequency bins. The
incoherent harmonic sum
algorithm is one way to
correct this.
TOP: time-series containing
pulsar (dots)
MIDDLE: frequency-domain
harmonics visible
BOTTOM: result of harmonic
sum for two diff. algorithms
Harmonic sum
The goal of the harmonic sum is to sum pulsar’s power that was spread into multiple
harmonics which are integer multiples of the fundamental frequency 𝑓0
h n 𝐻 = ෍
𝑖=1
𝐻
𝑃(𝑖𝑓0) ,
Where H is number of harmonics summed and 𝑃 is the power spectrum.
But we do not know 𝑓0 and we work with discrete indices
fundamental = 10.33Hz first harmonic = 20.66Hz second harmonic = 31Hz
Harmonic sum
The defacto pulsar processing code, PRESTO, uses the following formula:
h n 𝐻 =
1
𝐻
෍
𝑖=1
𝐻
𝑃
𝑛𝑖
𝐻
H is number of harmonics summed.
You can approach it from different end. Start with the index position of the fundamental
frequency n in frequency domain X[n] and add to it higher harmonics, that is X[2n], …
Harmonic sum – Problems
Harmonic sum – Problems
• Unfavorable access pattern
• Difficult data reuse
• Apply physical constrains
• h*(starting index)
• Stride in memory access is h
• Transpose of the data might help
• Simple data reuse is limited
• Cannot increase efficiency by increasing
number of fundamental freq. bins
• Complicated data reuse is resource and
bookkeeping heavy
• Decreases frequency range not index
range which we need to explore.
We aim to provide a selection of algorithm with different ratio of sensitivity/performance
Harmonic sum – Tree view
There are few possibilities how to do harmonic sum
Create all possible sums (exhaustive search) best
possible precision
Harmonic sum – Tree view
There are few possibilities how to do harmonic sum
Create all possible sums (exhaustive search) best
possible precision
Construct the sum from maxima of each harmonic.
That is ℎ 𝑛 𝐻 = σ𝑖=1
𝐻
max
1≤𝑗≤𝑖
𝑥 𝑖𝑛 + 𝑗
Harmonic sum – Tree view
There are few possibilities how to do harmonic sum
Create all possible sums (exhaustive search) best
possible precision
Construct the sum from maxima of each harmonic.
That is ℎ 𝑛 𝐻 = σ𝑖=1
𝐻
max
1≤𝑗≤𝑖
𝑥 𝑖𝑛 + 𝑗
Greedy algorithm which selects highest value
ℎ 𝑛 𝐻+1 = ℎ 𝑛 𝐻 + max 𝑥 𝐻𝑛 + 𝑗 , 𝑥(𝐻𝑛 + 𝑗 + 1)
Harmonic sum – Tree view
There are few possibilities how to do harmonic sum
Create all possible sums (exhaustive search) best
possible precision
Construct the sum from maxima of each harmonic.
That is ℎ 𝑛 𝐻 = σ𝑖=1
𝐻
max
1≤𝑗≤𝑖
𝑥 𝑖𝑛 + 𝑗
Greedy algorithm which selects highest value
ℎ 𝑛 𝐻+1 = ℎ 𝑛 𝐻 + max 𝑥 𝐻𝑛 + 𝑗 , 𝑥(𝐻𝑛 + 𝑗 + 1)
Harmonic sum – Tree view
There are few possibilities how to do harmonic sum
Create all possible sums (exhaustive search) best
possible precision
Construct the sum from maxima of each harmonic.
That is ℎ 𝑛 𝐻 = σ𝑖=1
𝐻
max
1≤𝑗≤𝑖
𝑥 𝑖𝑛 + 𝑗
Greedy algorithm which selects highest value
ℎ 𝑛 𝐻+1 = ℎ 𝑛 𝐻 + max 𝑥 𝐻𝑛 + 𝑗 , 𝑥(𝐻𝑛 + 𝑗 + 1)
Sum only integer multiples of the fundamental
ℎ 𝑛 𝐻 = ෍
𝑖=1
𝐻
𝑥(𝑖𝑛)
Harmonic sum – Simple
• Simple harmonic sum
Data are transposed, which improve
data access
Device mem. bandwidth limited (77%)
No explicit data reuse caching is poor
(10% L2 hit rate)
Memory access pattern when threads process
neighboring fundamental frequency bins.
Memory access pattern when threads process
same fundamental frequency bin from different
data.
Harmonic sum – Greedy
• Greedy harmonic sum
For data reuse we really on caches
(52% L2 hit rate)
– kernel is waiting for data
Device memory Bandwidth utilization
(66%)
Memory access pattern when threads process
neighboring fundamental frequency bins.
Memory access pattern when threads process
same fundamental frequency bin from different
data.
Harmonic sum – Presto, MaxDIT
Presto harmonic sum
Limited by type conversion (array
indexing). However fp32 compute
is still high.
Max harmonic sum
Max harmonic sum is two step
1) Calculate all max decimations -
Limited by compute (Load/Store,
floating point operations)
2) Calculate partial sums and SNR
Harmonic sum – Improvements
Harmonic sum – Results
As gold standard we have used our
implementation of presto‘s HRMS.
RIGHT: Sensitivity loss for pulsar
frequencies which are between
frequency bins. 50% decrease for
simple HRMS, only 30% for Greedy
and PRESTO.
LEFT: average sensitivity as it
depends on pulsar’s frequency.
Harmonic sum – Conclusions
• We have multiple algorithms with different parameters
• We need more sensitivity tests and tests of physical
correctness (artificial data, real data)
• We thinking about 2D harmonic sum for acceleration
searches
• We trying to increase performance

More Related Content

What's hot

TGS Arcis- Canada Arcis Processing Services Brochure
TGS Arcis- Canada Arcis Processing Services BrochureTGS Arcis- Canada Arcis Processing Services Brochure
TGS Arcis- Canada Arcis Processing Services BrochureTGS
 
Mobility collector: Battery Conscious Mobile Tracking
Mobility collector: Battery Conscious Mobile TrackingMobility collector: Battery Conscious Mobile Tracking
Mobility collector: Battery Conscious Mobile TrackingAdrian C. Prelipcean
 
Simulating X-ray Observations with yt
Simulating X-ray Observations with ytSimulating X-ray Observations with yt
Simulating X-ray Observations with ytJohn ZuHone
 
Radar 2009 a 3 review of signals, systems, and dsp
Radar 2009 a  3 review of signals, systems, and dspRadar 2009 a  3 review of signals, systems, and dsp
Radar 2009 a 3 review of signals, systems, and dspsubha5
 
Radar 2009 a 16 parameter estimation and tracking part2
Radar 2009 a 16 parameter estimation and tracking part2Radar 2009 a 16 parameter estimation and tracking part2
Radar 2009 a 16 parameter estimation and tracking part2Forward2025
 
Semi-Analytic Modeling: Creation of the Far-IR Populations
Semi-Analytic Modeling: Creation of the Far-IR PopulationsSemi-Analytic Modeling: Creation of the Far-IR Populations
Semi-Analytic Modeling: Creation of the Far-IR Populationsabenson
 
Academic cloud experiences cern v4
Academic cloud experiences cern v4Academic cloud experiences cern v4
Academic cloud experiences cern v4Tim Bell
 
Producing science with_ptf
Producing science with_ptfProducing science with_ptf
Producing science with_ptfbsesar
 
Ultimate astronomicalimaging
Ultimate astronomicalimagingUltimate astronomicalimaging
Ultimate astronomicalimagingClifford Stone
 
Udacity-Didi Challenge Finalists
Udacity-Didi Challenge FinalistsUdacity-Didi Challenge Finalists
Udacity-Didi Challenge FinalistsDavid Silver
 
SIGGRAPH 2018 - PICA PICA and NVIDIA Turing
SIGGRAPH 2018 - PICA PICA and NVIDIA TuringSIGGRAPH 2018 - PICA PICA and NVIDIA Turing
SIGGRAPH 2018 - PICA PICA and NVIDIA TuringElectronic Arts / DICE
 
TU2.T10.1.pptx
TU2.T10.1.pptxTU2.T10.1.pptx
TU2.T10.1.pptxgrssieee
 
Parametric Time Domain Method for separation of Cloud and Drizzle for ARM Clo...
Parametric Time Domain Method for separation of Cloud and Drizzle for ARM Clo...Parametric Time Domain Method for separation of Cloud and Drizzle for ARM Clo...
Parametric Time Domain Method for separation of Cloud and Drizzle for ARM Clo...Pratik Ramdasi
 
Earth Viewing Infrared Space System Sensor
Earth Viewing Infrared Space System SensorEarth Viewing Infrared Space System Sensor
Earth Viewing Infrared Space System SensorM. Faisal Halim
 
AEROSOL CLASSIFICATION RETRIEVAL ALGORITHMS FOR EARTHCARE/ATLID, CALIPSO/CALI...
AEROSOL CLASSIFICATION RETRIEVAL ALGORITHMS FOR EARTHCARE/ATLID, CALIPSO/CALI...AEROSOL CLASSIFICATION RETRIEVAL ALGORITHMS FOR EARTHCARE/ATLID, CALIPSO/CALI...
AEROSOL CLASSIFICATION RETRIEVAL ALGORITHMS FOR EARTHCARE/ATLID, CALIPSO/CALI...grssieee
 
Test time efficient group delay filter characterization technique using a dis...
Test time efficient group delay filter characterization technique using a dis...Test time efficient group delay filter characterization technique using a dis...
Test time efficient group delay filter characterization technique using a dis...Pete Sarson, PH.D
 
Behavioral modeling of Clock/Data Recovery
Behavioral modeling of Clock/Data RecoveryBehavioral modeling of Clock/Data Recovery
Behavioral modeling of Clock/Data RecoveryArrow Devices
 
Combined Complementary Filter For Inertial Navigation System
Combined Complementary Filter For Inertial Navigation SystemCombined Complementary Filter For Inertial Navigation System
Combined Complementary Filter For Inertial Navigation SystemMykola Novik
 
FR1.L09.2 - ONBOARD RADAR PROCESSING CONCEPTS FOR THE DESDYNI MISSION
FR1.L09.2 - ONBOARD RADAR PROCESSING CONCEPTS FOR THE DESDYNI MISSIONFR1.L09.2 - ONBOARD RADAR PROCESSING CONCEPTS FOR THE DESDYNI MISSION
FR1.L09.2 - ONBOARD RADAR PROCESSING CONCEPTS FOR THE DESDYNI MISSIONgrssieee
 

What's hot (20)

TGS Arcis- Canada Arcis Processing Services Brochure
TGS Arcis- Canada Arcis Processing Services BrochureTGS Arcis- Canada Arcis Processing Services Brochure
TGS Arcis- Canada Arcis Processing Services Brochure
 
Mobility collector: Battery Conscious Mobile Tracking
Mobility collector: Battery Conscious Mobile TrackingMobility collector: Battery Conscious Mobile Tracking
Mobility collector: Battery Conscious Mobile Tracking
 
Simulating X-ray Observations with yt
Simulating X-ray Observations with ytSimulating X-ray Observations with yt
Simulating X-ray Observations with yt
 
Radar 2009 a 3 review of signals, systems, and dsp
Radar 2009 a  3 review of signals, systems, and dspRadar 2009 a  3 review of signals, systems, and dsp
Radar 2009 a 3 review of signals, systems, and dsp
 
Prof Osamu Tajima
Prof Osamu TajimaProf Osamu Tajima
Prof Osamu Tajima
 
Radar 2009 a 16 parameter estimation and tracking part2
Radar 2009 a 16 parameter estimation and tracking part2Radar 2009 a 16 parameter estimation and tracking part2
Radar 2009 a 16 parameter estimation and tracking part2
 
Semi-Analytic Modeling: Creation of the Far-IR Populations
Semi-Analytic Modeling: Creation of the Far-IR PopulationsSemi-Analytic Modeling: Creation of the Far-IR Populations
Semi-Analytic Modeling: Creation of the Far-IR Populations
 
Academic cloud experiences cern v4
Academic cloud experiences cern v4Academic cloud experiences cern v4
Academic cloud experiences cern v4
 
Producing science with_ptf
Producing science with_ptfProducing science with_ptf
Producing science with_ptf
 
Ultimate astronomicalimaging
Ultimate astronomicalimagingUltimate astronomicalimaging
Ultimate astronomicalimaging
 
Udacity-Didi Challenge Finalists
Udacity-Didi Challenge FinalistsUdacity-Didi Challenge Finalists
Udacity-Didi Challenge Finalists
 
SIGGRAPH 2018 - PICA PICA and NVIDIA Turing
SIGGRAPH 2018 - PICA PICA and NVIDIA TuringSIGGRAPH 2018 - PICA PICA and NVIDIA Turing
SIGGRAPH 2018 - PICA PICA and NVIDIA Turing
 
TU2.T10.1.pptx
TU2.T10.1.pptxTU2.T10.1.pptx
TU2.T10.1.pptx
 
Parametric Time Domain Method for separation of Cloud and Drizzle for ARM Clo...
Parametric Time Domain Method for separation of Cloud and Drizzle for ARM Clo...Parametric Time Domain Method for separation of Cloud and Drizzle for ARM Clo...
Parametric Time Domain Method for separation of Cloud and Drizzle for ARM Clo...
 
Earth Viewing Infrared Space System Sensor
Earth Viewing Infrared Space System SensorEarth Viewing Infrared Space System Sensor
Earth Viewing Infrared Space System Sensor
 
AEROSOL CLASSIFICATION RETRIEVAL ALGORITHMS FOR EARTHCARE/ATLID, CALIPSO/CALI...
AEROSOL CLASSIFICATION RETRIEVAL ALGORITHMS FOR EARTHCARE/ATLID, CALIPSO/CALI...AEROSOL CLASSIFICATION RETRIEVAL ALGORITHMS FOR EARTHCARE/ATLID, CALIPSO/CALI...
AEROSOL CLASSIFICATION RETRIEVAL ALGORITHMS FOR EARTHCARE/ATLID, CALIPSO/CALI...
 
Test time efficient group delay filter characterization technique using a dis...
Test time efficient group delay filter characterization technique using a dis...Test time efficient group delay filter characterization technique using a dis...
Test time efficient group delay filter characterization technique using a dis...
 
Behavioral modeling of Clock/Data Recovery
Behavioral modeling of Clock/Data RecoveryBehavioral modeling of Clock/Data Recovery
Behavioral modeling of Clock/Data Recovery
 
Combined Complementary Filter For Inertial Navigation System
Combined Complementary Filter For Inertial Navigation SystemCombined Complementary Filter For Inertial Navigation System
Combined Complementary Filter For Inertial Navigation System
 
FR1.L09.2 - ONBOARD RADAR PROCESSING CONCEPTS FOR THE DESDYNI MISSION
FR1.L09.2 - ONBOARD RADAR PROCESSING CONCEPTS FOR THE DESDYNI MISSIONFR1.L09.2 - ONBOARD RADAR PROCESSING CONCEPTS FOR THE DESDYNI MISSION
FR1.L09.2 - ONBOARD RADAR PROCESSING CONCEPTS FOR THE DESDYNI MISSION
 

Similar to AstroAccelerate - GPU Accelerated Signal Processing on the Path to the Square Kilometre Array

Arduino-KickSat-ArduSat Knowledge Update
Arduino-KickSat-ArduSat Knowledge Update Arduino-KickSat-ArduSat Knowledge Update
Arduino-KickSat-ArduSat Knowledge Update PROSSATeam
 
Algoritmo de detecção de Pulso
Algoritmo de detecção de PulsoAlgoritmo de detecção de Pulso
Algoritmo de detecção de PulsoDanielFiuza8
 
DIGITAL V/S ANALOGUE OSCILLOSCOPE
DIGITAL V/S ANALOGUE OSCILLOSCOPE DIGITAL V/S ANALOGUE OSCILLOSCOPE
DIGITAL V/S ANALOGUE OSCILLOSCOPE Ikram Arshad
 
Astronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkAstronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkDatabricks
 
e-Science for the Science Kilometre Array
e-Science for the Science Kilometre Arraye-Science for the Science Kilometre Array
e-Science for the Science Kilometre ArrayJoint ALMA Observatory
 
The Square Kilometre Array: Overview and Engineering Update
The Square Kilometre Array: Overview and Engineering UpdateThe Square Kilometre Array: Overview and Engineering Update
The Square Kilometre Array: Overview and Engineering UpdateJoint ALMA Observatory
 
Distance measuring unit with zigbee protocol, Ultra sonic sensor
Distance measuring unit with zigbee protocol, Ultra sonic sensorDistance measuring unit with zigbee protocol, Ultra sonic sensor
Distance measuring unit with zigbee protocol, Ultra sonic sensorAshok Raj
 
COMPUTED TOMOGRAPHY by Heena-1.pptx
COMPUTED TOMOGRAPHY by Heena-1.pptxCOMPUTED TOMOGRAPHY by Heena-1.pptx
COMPUTED TOMOGRAPHY by Heena-1.pptxKhanZynab
 
Accelerating Astronomical Discoveries with Apache Spark
Accelerating Astronomical Discoveries with Apache SparkAccelerating Astronomical Discoveries with Apache Spark
Accelerating Astronomical Discoveries with Apache SparkDatabricks
 
A Low Power & Low Noise Multi-Channel ASIC for X-Ray and Gamma-Ray Spectroscopy
A Low Power & Low Noise Multi-Channel ASIC for X-Ray and Gamma-Ray SpectroscopyA Low Power & Low Noise Multi-Channel ASIC for X-Ray and Gamma-Ray Spectroscopy
A Low Power & Low Noise Multi-Channel ASIC for X-Ray and Gamma-Ray SpectroscopyGunnar Maehlum
 
HPC Clusters in the (almost) Infinite Cloud
HPC Clusters in the (almost) Infinite CloudHPC Clusters in the (almost) Infinite Cloud
HPC Clusters in the (almost) Infinite CloudAmazon Web Services
 
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...EarthCube
 
The Search for Gravitational Waves
The Search for Gravitational WavesThe Search for Gravitational Waves
The Search for Gravitational Wavesinside-BigData.com
 
Burst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runBurst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runIgor Sfiligoi
 
A Conceptual Design for a Large Ground Array of Fluorescence Detectors
A Conceptual Design for a Large Ground Array of Fluorescence DetectorsA Conceptual Design for a Large Ground Array of Fluorescence Detectors
A Conceptual Design for a Large Ground Array of Fluorescence DetectorsToshihiro FUJII
 
Combining remote sensing earth observations and in situ networks: detection o...
Combining remote sensing earth observations and in situ networks: detection o...Combining remote sensing earth observations and in situ networks: detection o...
Combining remote sensing earth observations and in situ networks: detection o...Integrated Carbon Observation System (ICOS)
 
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...InfluxData
 

Similar to AstroAccelerate - GPU Accelerated Signal Processing on the Path to the Square Kilometre Array (20)

Arduino-KickSat-ArduSat Knowledge Update
Arduino-KickSat-ArduSat Knowledge Update Arduino-KickSat-ArduSat Knowledge Update
Arduino-KickSat-ArduSat Knowledge Update
 
Algoritmo de detecção de Pulso
Algoritmo de detecção de PulsoAlgoritmo de detecção de Pulso
Algoritmo de detecção de Pulso
 
Hl3413921395
Hl3413921395Hl3413921395
Hl3413921395
 
DIGITAL V/S ANALOGUE OSCILLOSCOPE
DIGITAL V/S ANALOGUE OSCILLOSCOPE DIGITAL V/S ANALOGUE OSCILLOSCOPE
DIGITAL V/S ANALOGUE OSCILLOSCOPE
 
Astronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkAstronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache Spark
 
e-Science for the Science Kilometre Array
e-Science for the Science Kilometre Arraye-Science for the Science Kilometre Array
e-Science for the Science Kilometre Array
 
The Square Kilometre Array: Overview and Engineering Update
The Square Kilometre Array: Overview and Engineering UpdateThe Square Kilometre Array: Overview and Engineering Update
The Square Kilometre Array: Overview and Engineering Update
 
Distance measuring unit with zigbee protocol, Ultra sonic sensor
Distance measuring unit with zigbee protocol, Ultra sonic sensorDistance measuring unit with zigbee protocol, Ultra sonic sensor
Distance measuring unit with zigbee protocol, Ultra sonic sensor
 
COMPUTED TOMOGRAPHY by Heena-1.pptx
COMPUTED TOMOGRAPHY by Heena-1.pptxCOMPUTED TOMOGRAPHY by Heena-1.pptx
COMPUTED TOMOGRAPHY by Heena-1.pptx
 
FermiPoster
FermiPosterFermiPoster
FermiPoster
 
Accelerating Astronomical Discoveries with Apache Spark
Accelerating Astronomical Discoveries with Apache SparkAccelerating Astronomical Discoveries with Apache Spark
Accelerating Astronomical Discoveries with Apache Spark
 
A Low Power & Low Noise Multi-Channel ASIC for X-Ray and Gamma-Ray Spectroscopy
A Low Power & Low Noise Multi-Channel ASIC for X-Ray and Gamma-Ray SpectroscopyA Low Power & Low Noise Multi-Channel ASIC for X-Ray and Gamma-Ray Spectroscopy
A Low Power & Low Noise Multi-Channel ASIC for X-Ray and Gamma-Ray Spectroscopy
 
HPC Clusters in the (almost) Infinite Cloud
HPC Clusters in the (almost) Infinite CloudHPC Clusters in the (almost) Infinite Cloud
HPC Clusters in the (almost) Infinite Cloud
 
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...
 
a63_Geer
a63_Geera63_Geer
a63_Geer
 
The Search for Gravitational Waves
The Search for Gravitational WavesThe Search for Gravitational Waves
The Search for Gravitational Waves
 
Burst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runBurst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud run
 
A Conceptual Design for a Large Ground Array of Fluorescence Detectors
A Conceptual Design for a Large Ground Array of Fluorescence DetectorsA Conceptual Design for a Large Ground Array of Fluorescence Detectors
A Conceptual Design for a Large Ground Array of Fluorescence Detectors
 
Combining remote sensing earth observations and in situ networks: detection o...
Combining remote sensing earth observations and in situ networks: detection o...Combining remote sensing earth observations and in situ networks: detection o...
Combining remote sensing earth observations and in situ networks: detection o...
 
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
 

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Skynet Technologies
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTopCSSGallery
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...ScyllaDB
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxjbellis
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Paige Cruz
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctBrainSell Technologies
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfAnubhavMangla3
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 

Recently uploaded (20)

Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 

AstroAccelerate - GPU Accelerated Signal Processing on the Path to the Square Kilometre Array

  • 1. www.oerc.ox.ac.uk AstroAccelerate GPU accelerated signal processing on the path to the Square Kilometre Array Wes Armour, Karel Adamek, Sofia Dimoudi, Jan Novotny, Nassim Ouannough, Cees Carels Oxford e-Research Centre, Department of Engineering Science University of Oxford 20th March 2019
  • 2. Part One A brief introduction to
  • 3. What is SKA? What does SKA stand for? Square Kilometre Array, so called because it will have an effective collecting area of a square kilometre. What is SKA? SKA is a ground based radio telescope that will span continents. Where will SKA be located? SKA will be built in South Africa and Australia. Core Station Graphic courtesy of Anne Trefethen Example of proposed SKA configuration
  • 4. SKA science SKA will study a wide range of science cases and aims to answer some of the fundamental questions mankind has about the universe we live in. • How do galaxies evolve – What is dark energy? • Tests of General Relativity – Was Einstein correct? • Probing the cosmic dawn – How did stars form? • The cradle of life – Are we alone in the Universe?
  • 6. https://commons.wikimedia.org/wiki/File:Planets_and_sun_size_comparison.jpg (Author: Lsmpascal) Sun Pulsars – size and scale Earth Pulsars are magnetized, rotating neutron stars which emit synchrotron radiation from their poles (Crab Nebula). They are typically 1-3 Solar masses in size, have a diameter of 10-20 Kilometres and a pulse period ranging from milliseconds to seconds. Their magnetic field is offset from the axis of rotation so we observe them as cosmic lighthouses. Pulsar Amherst College Hesteret.al.
  • 7. Credit: FRB110220 Dan Thornton (Manchester) SKA time domain science - Fast Radio Bursts Fast Radio Bursts (FRBs), were first discovered in 2005 by Lorimer et al. They are observed as extremely bright single pulses that are extremely dispersed (meaning that they are likely to be far away, maybe extra galactic). So far around 15 have been observed in survey data. They are of unknown origin, but likely to represent some of the most extreme physics in our Universe. Hence they are extremely interesting objects to study.Time Frequency
  • 9. SKA time domain - data rates The SKA will produce vast amounts of data. In the case of time-domain science we expect the telescope to be able to place ~2000 observing beams on the sky at any one time (there are trivially parallel to compute). The telescope will take 20,000 samples per second for each of those beams and then it will measure power in 4096 frequency channels for each time sample. Each of those individual samples will comprise of 4x8 bits, although we are only really interested in one of the 8 bits of information. Doing the math tells us that we will need to process 160GB/s of relevant data. This is approximately equal to analysing 50 hours of HD television data per second. The most costly computational operations in data processing pipeline are DDTR ~ O(ndms * nbeams * nsamps * nchans ) FDAS ~ O(ndms * nbeams * nsamps * nacc * log(nsamps) * 1/tobs ) Requiring ~2 PetaFLOP of Compute!
  • 10. SKA time domain - signal processing The time domain team is an international team led by Oxford and Manchester. It aims to deliver an end-to-end signal processing pipeline for time domain science performed by SKA (see right). Our work at OeRC has focussed on vertical prototyping activities. We are interested in using many-core technologies, such as GPUs to perform the processing steps within the signal processing pipeline with the aim of achieving real-time processing for the SKA. Image courtesy of Aris Karastergiou Time Domain Team Search for periodic signals search for fast radio bursts
  • 11. Part Four GPU accelerated signal processing library for time-domain radio astronomy.
  • 12. AstroAccelerate AstroAccelerate is a GPU enabled software package that focuses on achieving real-time processing of time-domain radio-astronomy data. It uses the CUDA programming language for NVIDIA GPUs. The massive computational power of modern day GPUs allows the code to perform algorithms such as de- dispersion, single pulse searching and Fourier Domain Acceleration Searching in real-time on very large data-sets which are comparable to those which will be produced by next generation radio-telescopes such as the SKA. https://github.com/AstroAccelerateOrg/astro-accelerate
  • 13. AstroAccelerate - Signal Processing De-dispersion Periodicity Search Harmonic Sum (Deep dive two) Fourier Domain Acceleration search Single Pulse Search (Deep dive one) Radio Frequency Interference Mitigation
  • 14. AstroAccelerate - API • API follows a simple pattern: configure, bind, run. • Select which pipeline modules to run, configure module plan, then bind plan to the API. • API calculates the strategy with the optimal configuration for the plan. • When all strategy objects are ready, the user selected modules are run within a pipeline. Select pipeline modules Configure module plans bind plan to API API calculates optimal strategy Run pipeline Bind input data to API C++ /Python Cees Carels
  • 15. AstroAccelerate - Code Features • Usable as a library (.so) and/or standalone executable. • Examples with instructions on how to compile and link. • Regular releases (semantic versioning). • CMake build system. • Full doxygen documentation and readme. • Automated CI, unit tests. Cees Carels
  • 16. Part Five Deep dive into recent work
  • 17. www.oerc.ox.ac.uk Karel Adámek, Wes Armour Single Pulse Detection
  • 18. Single Pulse Search Single pulse search (SPS) could be done through matched filters these are very sensitive but has problem with “quickly”. Using a Boxcar filter for the single pulse search (SPS): • Allows us to reuse data • Independent of pulse shape • We can trade sensitivity for performance • Less sensitive by design Aim is to detect pulses of different shapes and widths at unknown position within the signal and do it quickly.
  • 19. Single Pulse Search: How to detect pulses with boxcars Position of the boxcar is important We quantify coverage of the pulses by the distance between boxcar filters L. • Pulse may end up between boxcars • By decreasing L we cover pulses better SNR is • Increased by adding signal • Decreased by adding noise Signal’s strength is measured as signal-to-noise ratio (SNR) 𝑆𝑁𝑅 = 𝑥 − 𝜇 𝜎 , Where 𝑥 is the sample value, 𝜇 is the mean and 𝜎 is the standard deviation.
  • 20. Single Pulse Search: How to detect pulses with boxcars Boxcar which is: • too short does not cover pulse fully • too long does add unnecessary noise We need different boxcar widths W to better detect different pulse widths. SNR is • Increased by adding signal • Decreased by adding noise Width of the boxcar filter is also important
  • 21. Single Pulse Search: What do we need to do? Summary: • Position of the boxcar relative to the pulse is important. This is expressed by the distance between boxcars L. • Boxcar width W is important for detection of pulses with different widths. For ideal detection we need to do: at every point Output: Highest SNR detected at given sample. • We do not need to keep values of all boxcar filters just highest SNR!
  • 22. Single Pulse Search: Two algorithms BoxDIT • Starts from ideal Boxcar filter • Top-down – starts with good sensitivity but poor performance • Easily adjustable • Can be very sensitive • Not as fast IGrid • Start from decimation in time (DIT) • Bottom-up – starts with good performance but poor sensitivity • Less flexible • Faster The algorithm must be able to • perform very long boxcar filters; for SKA this is 8000+ samples • Adjustable sensitivity How to adjust sensitivity … and increase performance: • By decreasing/increasing distance between boxcars L • By performing more/less boxcars of different widths W • After some point it is pointless to decrease L without more widths W
  • 23. Single Pulse Search: BoxDIT BoxDIT has two steps: • Decimation in time - is used to control sensitivity • Ideal boxcar filter (Scan) – is calculating boxcar filters. BoxDIT is reusing previously (time) decimated data to build longer boxcar widths. In GPU implementation both steps are performed at once and kernel calculates boxcar filters as well as decimation for next iteration. BOTTOM: Using combinations of data at different decimation levels allows us to construct longer width boxcars. Diagram of the BoxDIT algorithm.
  • 24. Single Pulse Search: BoxDIT Scan at every point Algorithm for scan at every point (applying set of boxcar filters) first calculate small scan at every point (here 4). The value of the longest boxcar (here 4) is stored into shared memory. Stored in registers Stored into shared memory as well
  • 25. Single Pulse Search: BoxDIT scanpart Showing algorithm steps only for every 4th thread. Other threads doing the same thing for other points. Each thread keeps values of boxcar filters in registers. These are increased with every step of the algorithm. In each step i, an active thread A, calculates a source thread id as Si=A-i*4. The value of the longest boxcar calculated at beginning is loaded from source thread (from shared memory) and used to calculate longer boxcars in the active thread. These are kept by the active thread. The highest SNR from the newly calculated boxcars is then compared with the SNR of the source thread and stored at it’s position in shared memory if higher.
  • 26. Single Pulse Search: BoxDIT performance When calculating 32 boxcar per iteration (1% signal loss, idealised case) code is limited by compute with 63% device memory bandwidth utilisation. It is 83x faster then real-time. When calculating 16 boxcars per iteration (2% signal loss, idealised) code is limited by device memory bandwidth (86%). It is 170x faster then real time. Other versions of BoxDIT algorithm
  • 27. Single Pulse Search: Two algorithms BoxDIT • Starts from ideal Boxcar filter • Top-down – starts with good sensitivity but poor performance • Easily adjustable • Not as fast IGrid • Start from decimation in time (DIT) • Bottom-up – starts with good performance but poor sensitivity • Less flexible • Faster
  • 28. Single Pulse Search: IGrid The IGrid algorithm is based on combination of different decimations in time, which are shifted in time samples.
  • 29. Single Pulse Search: IGRID This algorithm could be interpreted also as a binary tree where leaves indicate a shift in number of time samples. The binary tree view suggest what could be calculated locally (red area). Therefore the only thing shared through iterations is the time-series with zero shift. It also offers a way how to calculate different boxcar widths, that is by going to the root of the tree.
  • 30. Single Pulse Search: IGRID To calculate individual IGRID iterations is inefficient and very demanding on device memory bandwidth. Thus we calculate multiple IGRID iterations per thread-block. Points represent layers that must be calculated to get desired sensitivity must be calculated but it will be recalculated by the next block recalculated layer Is calculated but it is not required TOP: to calculate individual IGRID iterations a lot of layers had to be shared BOTTOM: Each thread-block calculates multiple iterations.
  • 31. Single Pulse Search: BoxDIT performance IGrid 1 – Signal loss ~6% IGrid 2 – Signal loss ~4.5% IGrid 3 – Signal loss ~2%
  • 32. Single Pulse Search: Results Number of DM trials/second for SKA-mid sized data is about ~6000. This means BOXDIT is ~83x-200x faster then real time IGRID is ~150x-580x faster then real time. Left: Comparison of algorithms on average signal loss and performance (DM trials).
  • 33. Conclusions • Quantified source of sensitivity loss • Adjustable sensitivity • Two algorithms with different sensitivity/performance ratio • BoxDIT algorithm is in AstroAccelerate and used for science output
  • 34. www.oerc.ox.ac.uk Karel Adámek, Jan Novotný, Wes Armour Harmonic Sum
  • 35. Harmonic sum When searching for pulsars using Fourier domain methods the power of the pulsar is in frequency domain spread to multiple frequency bins. The incoherent harmonic sum algorithm is one way to correct this. TOP: time-series containing pulsar (dots) MIDDLE: frequency-domain harmonics visible BOTTOM: result of harmonic sum for two diff. algorithms
  • 36. Harmonic sum The goal of the harmonic sum is to sum pulsar’s power that was spread into multiple harmonics which are integer multiples of the fundamental frequency 𝑓0 h n 𝐻 = ෍ 𝑖=1 𝐻 𝑃(𝑖𝑓0) , Where H is number of harmonics summed and 𝑃 is the power spectrum. But we do not know 𝑓0 and we work with discrete indices fundamental = 10.33Hz first harmonic = 20.66Hz second harmonic = 31Hz
  • 37. Harmonic sum The defacto pulsar processing code, PRESTO, uses the following formula: h n 𝐻 = 1 𝐻 ෍ 𝑖=1 𝐻 𝑃 𝑛𝑖 𝐻 H is number of harmonics summed. You can approach it from different end. Start with the index position of the fundamental frequency n in frequency domain X[n] and add to it higher harmonics, that is X[2n], …
  • 38. Harmonic sum – Problems
  • 39. Harmonic sum – Problems • Unfavorable access pattern • Difficult data reuse • Apply physical constrains • h*(starting index) • Stride in memory access is h • Transpose of the data might help • Simple data reuse is limited • Cannot increase efficiency by increasing number of fundamental freq. bins • Complicated data reuse is resource and bookkeeping heavy • Decreases frequency range not index range which we need to explore. We aim to provide a selection of algorithm with different ratio of sensitivity/performance
  • 40. Harmonic sum – Tree view There are few possibilities how to do harmonic sum Create all possible sums (exhaustive search) best possible precision
  • 41. Harmonic sum – Tree view There are few possibilities how to do harmonic sum Create all possible sums (exhaustive search) best possible precision Construct the sum from maxima of each harmonic. That is ℎ 𝑛 𝐻 = σ𝑖=1 𝐻 max 1≤𝑗≤𝑖 𝑥 𝑖𝑛 + 𝑗
  • 42. Harmonic sum – Tree view There are few possibilities how to do harmonic sum Create all possible sums (exhaustive search) best possible precision Construct the sum from maxima of each harmonic. That is ℎ 𝑛 𝐻 = σ𝑖=1 𝐻 max 1≤𝑗≤𝑖 𝑥 𝑖𝑛 + 𝑗 Greedy algorithm which selects highest value ℎ 𝑛 𝐻+1 = ℎ 𝑛 𝐻 + max 𝑥 𝐻𝑛 + 𝑗 , 𝑥(𝐻𝑛 + 𝑗 + 1)
  • 43. Harmonic sum – Tree view There are few possibilities how to do harmonic sum Create all possible sums (exhaustive search) best possible precision Construct the sum from maxima of each harmonic. That is ℎ 𝑛 𝐻 = σ𝑖=1 𝐻 max 1≤𝑗≤𝑖 𝑥 𝑖𝑛 + 𝑗 Greedy algorithm which selects highest value ℎ 𝑛 𝐻+1 = ℎ 𝑛 𝐻 + max 𝑥 𝐻𝑛 + 𝑗 , 𝑥(𝐻𝑛 + 𝑗 + 1)
  • 44. Harmonic sum – Tree view There are few possibilities how to do harmonic sum Create all possible sums (exhaustive search) best possible precision Construct the sum from maxima of each harmonic. That is ℎ 𝑛 𝐻 = σ𝑖=1 𝐻 max 1≤𝑗≤𝑖 𝑥 𝑖𝑛 + 𝑗 Greedy algorithm which selects highest value ℎ 𝑛 𝐻+1 = ℎ 𝑛 𝐻 + max 𝑥 𝐻𝑛 + 𝑗 , 𝑥(𝐻𝑛 + 𝑗 + 1) Sum only integer multiples of the fundamental ℎ 𝑛 𝐻 = ෍ 𝑖=1 𝐻 𝑥(𝑖𝑛)
  • 45. Harmonic sum – Simple • Simple harmonic sum Data are transposed, which improve data access Device mem. bandwidth limited (77%) No explicit data reuse caching is poor (10% L2 hit rate) Memory access pattern when threads process neighboring fundamental frequency bins. Memory access pattern when threads process same fundamental frequency bin from different data.
  • 46. Harmonic sum – Greedy • Greedy harmonic sum For data reuse we really on caches (52% L2 hit rate) – kernel is waiting for data Device memory Bandwidth utilization (66%) Memory access pattern when threads process neighboring fundamental frequency bins. Memory access pattern when threads process same fundamental frequency bin from different data.
  • 47. Harmonic sum – Presto, MaxDIT Presto harmonic sum Limited by type conversion (array indexing). However fp32 compute is still high. Max harmonic sum Max harmonic sum is two step 1) Calculate all max decimations - Limited by compute (Load/Store, floating point operations) 2) Calculate partial sums and SNR
  • 48. Harmonic sum – Improvements
  • 49. Harmonic sum – Results As gold standard we have used our implementation of presto‘s HRMS. RIGHT: Sensitivity loss for pulsar frequencies which are between frequency bins. 50% decrease for simple HRMS, only 30% for Greedy and PRESTO. LEFT: average sensitivity as it depends on pulsar’s frequency.
  • 50. Harmonic sum – Conclusions • We have multiple algorithms with different parameters • We need more sensitivity tests and tests of physical correctness (artificial data, real data) • We thinking about 2D harmonic sum for acceleration searches • We trying to increase performance