SlideShare a Scribd company logo
1 of 54
Najeeb Ahmad
Master Thesis Presentation
May, 2012
Supervisor: Dr. Sun Jinping
Design and Implementation of GPU
based SAR Image Processor
School of Electronic Information
Engineering
Beihang University, Beijing China.
Contents
1. Introduction
2. GPU Computing
3. SAR Processing
4. Implementation
5. Conclusion & Future Work
1.Introduction
 Problem
 Motivation
 Objective
 Methodology
PROBLEM
Synthetic Aperture Radar data processing is a
computationally intensive and time consuming task
using conventional CPUs. Given the increasing
popularity and use of GPU for scientific computing,
it is required to accelerate simplified range Doppler
SAR processing algorithm on GPU using modern
GPGPU technology to achieve real/near real-time
performance and to evaluate its suitability for SAR
processing.
MOTIVATION
 Computationally intensive and time consuming
nature of SAR processing algorithms.
 Inherent algorithm parallelism in most SAR
processing algorithms.
 Advent of modern GPGPU technology and
availability of commodity GPUs as general
purpose computation engines.
 Architectural parallelism and availability of
sufficient hardware resources in modern GPUs
rendering them especially useful for handling
large data quantities and parallel SAR algorithm
implementation.
OBJECTIVE
 To implement and accelerate simplified range
Doppler SAR processing algorithm on a modern
NVIDIA TESLA GPU using CUDA and MATLAB-
GPU capabilities.
 The resulting research will explore the areas like:
 Algorithm adaptation for parallel implementation.
 Suitability of MATLAB for algorithm implementation.
 Suitability of CUDA for algorithm implementation.
 Comparison of CPU/CUDA/MATLAB-GPU
implementations.
 GPU as SAR processing platform.
METHODOLOGY
 Algorithm implementation and verification on Intel
Xeon CPU using MATLAB.
 Identification of parallelizable portions of
algorithm.
 Algorithm implementation on TESLA C1060 GPU
using MATLAB’s native GPU capabilities.
 Algorithm implementation on TESLA C1060 GPU
using CUDA.
 Analysis of CPU, MATLAB-GPU and CUDA
implementations.
2.GPU Computing
 Introduction to GPU Computing
 GPGPU: Brief History
 NVIDIA CUDA
 Writing efficient code
Introduction to GPU Computing
 Use of Graphics Processing Units (GPUs) for
general purpose computing applications.
 CPU: Single, four or eight cores. Capable of
handling few threads. Suitable for serial code.
 GPU: Hundreds of cores. Capable of handling
hundreds of threads. Suitable for parallel code.
Introduction to GPU Computing
 GPU Computing Model: Heterogeneous
computing model employing both CPU and GPU
with serial computing on CPU, parallel computing
on GPU.
GPGPU: Brief History
 First use of GPU as general purpose computing
device, around 1999-2000 using graphics APIs.
Huge performance boosts observed. Generally
unpopular due to tedious programming.
 Introduction of NVIDIAs “CUDA” and AMDs
“Stream Computing” in 2007. Beginning of
modern GPGPU era. Other vendors introduced
their own GPGPU systems.
 NVIDIAs CUDA gaining popularity due to its
maturity and performance.
NVIDIA CUDA
 Compute Unified Device Architecture.
 Comprises of Instruction Set Architecture (ISA)
and parallel compute engine in GPU
programmable with high level languages
extended for GPU computing.
 CUDA framework comprises of two parts;
hardware and software. From software
perspective, CUDA means extended C/C++,
FORTRAN to support GPU computing.
 CUDA is “Single Instruction Multiple Thread”
(SIMT) architecture.
CUDA Hardware
 Streaming multiprocessor (SM): Basic computing unit
of the GPU. Comprises of eight streaming processors
(SP) and memory. Different GPUs differ in number of
SMs and SP clock frequency.
SP SP
SP SP
SP SP
SP SP
SFU SFU
MT IU
Shared Memory
CUDA Memory Architecture
 Understanding of memory architecture critical for
writing efficient CUDA programs.
 All CUDA-enabled hardware have following types
of memory:
 Global memory
 Shared memory and registers.
 Texture memory and texture cache.
 Constant memory and constant cache.
 Local memory for register spilling.
SP SP
Shared memory
SP SP SP
Texture cache
Constant cache
SM n
SP SP
Shared memory
SP SP SP
Texture cache
Constant cache
SM 3
SP SP
Shared memory
SP SP SP
Texture cache
Constant cache
SP SP
Shared memory
SP SP SP
Texture cache
Constant cache
SM 1
SM 2
GPU
Global memory (RAM)
Local MemoryTexture memory
Constant
memory
NVIDIA TESLA C1060 GPU
 PCI Express 2.0 compliant computing processor
board based on NVIDIA Tesla T10 graphics
processing unit targeted for HPC applications.
 Feature highlights
 30 SMs = 240 SPs.
 SP Clock = 1.296 GHz
 4 GB DDR3 memory with 120
GB/s bandwidth.
 IEEE 754 single and double
floating point compliant.
 933 GFLOPS single and 78
GFLOPS double precision
performance.
 Compute capability: 1.3
 Supported by MATLAB for
GPU computing
CUDA Programming Model
 At its core are thread groups, shared memory and
barrier synchronization.
 Provides coarse-grained data and task
parallelism and fine-grained data and thread
parallelism providing expressivity and scalability.
 Thread hierarchy: Grid, blocks, threads.
 Kernels: Functions executed on device (GPU) in
parallel threads.
 CUDA provides APIs to run and launch kernels in
parallel threads and to synchronize them.
Processing Flow
 Copy input data from CPU to GPU memory.
 Load GPU program and execute, caching result
on the device.
 Copy results from GPU to CPU.
RAM
CPU
Host
Global memory
Constant
Texture
GPU
Device
Writing Efficient Code
 High priority considerations
 Minimum CPU-GPU transfers.
 Use of coalesced data transfers.
 Use of shared memory instead of global memory
whenever possible.
 Avoiding different execution paths within a warp.
 Medium priority considerations
 Access to shared memory should be planned to
avoid serialization.
 Redundant data transfers from global memory
should be avoided.
Writing Efficient Code
 Threads per block should be multiple of 32.
 Use of fast math library whenever possible.
 Low Priority Considerations
 Use of zero copy operations.
 For kernels with long argument list, some argument
should be placed in constant memory.
 Expensive modulo, division operations should be
avoided in favor of shift operations whenever
possible.
 Automatic conversion of double to float should be
avoided.
 Loop unrolling should be used whenever possible.
3.SAR Processing
 What is Synthetic Aperture Radar
 SAR Processing
 Processing Algorithms
 Basic RDA
 Simplified RDA
What is Synthetic Aperture Radar
 An active microwave remote sensing imaging
system.
 Employs long range propagation characteristics
of radar and complex signal processing
techniques to produce high resolution images.
 High resolution achieved by synthesizing long
antenna aperture through signal processing
techniques.
 Pros (in comparison with optical systems):
 All weather and day and night operation.
 No effects of constituents of atmosphere.
 Sensitivity to dielectric properties (can image ice,
biomass etc.)
 Sensitivity to surface roughness (oceans, wind
What is Synthetic Aperture Radar
 Accurate measurement of distance.
 Sensitivity to man made objects.
 Sensitivity to target structure.
 Subsurface penetration.
 Cons
 Complex interactions (difficult to visualize and
understand)
 Speckle effects (difficult in visual interpretation)
 Topographic effects
SAR Processing
 A set of procedures to obtain interpretable image
from raw scattered in azimuth and range
directions.
 In range, data is scattered by duration of
transmitted FM pulse.
 In azimuth, data spread by duration point target is
illuminated by the radar beam.
 SAR processing compresses this data taking into
account range cell migration, earth curvature,
earth rotation, air/spacecraft attitude noise to
produce the final image.
 Given nature of SAR system and signals, signal
processing rather than image processing provide
appropriate tools for SAR processing.
SAR Processing Algorithms
 Mainstream SAR processing include:
 Range Doppler algorithm (RDA)
 High resolution images for low squint and for relatively
smaller aperture sizes. Very popular.
 Chirp scaling algorithm (CSA)
 Two-dimensional operations with range independence
followed by range corrections in range Doppler domain.
 Omega-K algorithm (ωKA)
 Efficient and accurate in two-dimensional frequency
domain.
 SPECAN algorithm
 Good for medium to low resolution requirements.
Range Doppler Algorithm
 Versions of range Doppler:
 Basic RDA
 RDA with accurate SRC
 RDA with approximate SRC
 Simplified range Doppler
Basic RDA
Raw data
Range
Compression
Azimuth FFT
RCMC
Azimuth
Compression
Azimuth IFFT
and lookup
Summation
Final Image
Range FFT,
matched filter
multiply, range
IFFT
Data in range
Doppler domain
Interpolation
operation in range
Doppler domain
Azimuth matched
filter multiply
To bring back
signal into time
domain.
Simplified RDA
 For narrower swath width and medium resolution
requirements, RCM can be assumed independent
of range.Raw data Pre-filtering
Range
Compression
Azimuth FFTRCMCRange IFFT
Azimuth
Compression
Azimuth IFFT
and lookup
Summation
Final Image
To remove
Doppler centroid
Range FFT,
matched filter
multiply (No
range IFFT)
Both range and
azimuth in
frequency domain
RCM phase
function multiply
with each range
line
Data in range
Doppler domain
4.Implementation
 Hardware resources
 Software resources
 CPU Implementation
 MATLAB GPU Implementation
 CUDA Implementation
 Result Comparison
Hardware resources
CPU GPU
Name NVIDIA Tesla
C1060
# of cores 240
SP Clock 1.296 GHz
Memory 4 GB GDDR3
Maximum
memory
bandwidth
102 GB/s
Memory
interface
512 bit – PCI
Express
GFLOPS 933 single
precision, 78
double precision
Name Intel Xeon
E5504
CPU Clock 2 GHz
# of cores 4
System Memory 4 GB
DDR3 Clock 800 MHz
Maximum
memory
bandwidth
19.2 GB/s
Memory type DDR3 PC3
PCI Slot PCI Express
Software resources
CPU GPU
 Windows 7 Ultimate
64-bit
 MATLAB release
2010b
 Visual Studio 2008
SP1
 CUDA Toolkit 4.1
 MATLAB release
2010b
 NVIDIA Parallel
Nsight
 Visual Profiler
 CUDA MEMCHECK
 CUFFT library
RADARSAT – I Data
• CEOS Format
• Raw data is required to be
extracted from CEOS data
before SAR processing
algorithm can be applied.
Parameter Value Units
Sampling rate 32.317 MHz
Range FM rate 0.7213
5
MHz/µs
Pulse duration 41.74 µs
Radar frequency 5.3 GHz
Radar wavelength 0.0565
7
m
Pulse repetition
frequency
1256.9
8
Hz
Effective radar
velocity
7062 m/s
Azimuth FM rate 1733 Hz/s
Table RADARSAT – I data parameters
CEOS data
CEOS data
extraction utility
RAW SAR data
SAR Processing GUI
Functions
• CEOS data
extraction.
• MATLAB-
CPU SAR
processing.
• MATLAB-
GPU SAR
processing
• CUDA
input/output
manipulation.
• CUDA
program
execution.
CPU Implementation
 Implemented using MATLAB
 FFT/IFFT using standard MATLAB functions
CPU Processed SAR image
A 2048 x 4096 SAR image using CPU based implementation
MATLAB-GPU Implementation
 MATLAB started supporting GPU computing since
MATLAB release 2010b.
 Implemented using native MATLAB-GPU functions only
(no CUDA kernel calls).
 Vectorization strategy employed to implement vector-
matrix multiplications on GPU.
 All FFT/IFFTs performed using MATLAB-GPU FFT/IFFT
support functions.
Column1
Column2
………...
Columnn
Column1
Column2 ………...
Columnn
Column1
Column2
………...
Columnn
MATLAB-GPU Implementation
 Limit on maximum image size that can be
calculated due to GPU memory constraints.
MATLAB-GPU Implementation
 Speedup as high as 21 achieved compared with
CPU implementation
MATLAB-GPU Implementation
A 2048 x 4096 SAR image using MATLAB-GPU based implementation
MATLAB-GPU Implementation
 Advantages
 Quick and easy to implement
 Sufficient speedups obtained with little effort
 Little knowledge of GPU hardware and no
knowledge of optimization techniques required.
 Disadvantages
 Currently, limited number of MATLAB functions
supported on GPU.
 Not all overloads of a function available for GPU.
 Lesser control of hardware resources and memory.
 Not many optimization options.
CUDA Implementation
 Strategy
 Signal data read as binary file
 Vectors, matched filters calculated on CPU
 Vectors/signal data transferred to GPU
 Following kernels executed in order on GPU
 Pre-filtering kernel
 Range compression kernel
 RCMC kernel
 Azimuth compression kernel
 Image pixel calculation kernel
 Data transferred from GPU to CPU and saved on
disk as image.
Optimization considerations
 Chosen block size = 8 × 8 = 64. Conforms with
memory coalescing requirements.
 Constant variables stored in constant memory
 Local variable and phase function calculation
whenever possible to reduce global memory
access.
 CPU-GPU data transfer kept to minimum by
transferring data from CPUGPU at beginning
and GPUCPU transfers at the end of algorithm.
 Using CUFFTs cufftPlanMany() plan for
FFT/IFFTs along data columns.
CUDA Implementation Results
A 2048 x 4096 SAR image using CUDA based implementation
CUDA Implementation Results
CUDA Implementation Results
CUDA/MATLAB-CPU/MATLAB-CPU
Computation Time Comparison
MATLAB-GPU/CUDA Computation
Time Comparison
MATLAB-GPU/CUDA speedup
comparison
 Speedups as high as 53 times achieved in
comparison with maximum speedup of 21 times
in MATLAB.
5. Conclusions & Future
Work
Conclusions
 Feasibility of GPU for SAR processing
 Amount of data, computational effort and inherent
algorithm parallelism makes SAR processing
suitable on GPU.
 TESLA C1060 GPU offers enough memory to
handle various common SAR image sizes.
 Cooling GPU may be a challenge in some
environments.
 Scalability of CUDA will prove to be an advantage to
port existing SAR code to newer GPUs.
 GPUs might not be suitable where customizable
hardware is required or military hardware standards
are to be adhered.
Conclusions
 MATLAB-GPU based SAR Processing
 Significant speedups compared with CPU.
 Quick and easy to implement.
 Has some limitations:
 Currently have lesser function support for GPU. Expected to
improve with future MATLAB releases.
 Vectorization strategy needs more memory. Future release
promise to take away need for vectorization (e.g. bsxfun in
release 2012a).
 Lesser control over GPU resources (memory etc.).
 CUDA SAR Processing
 CUDA: Flexible and scalable with least learning curve.
 More control over GPU resources.
 Optimization strategies can be applied.
 Faster and more memory efficient than MATLAB
implementation.
Conclusions
 Downsides of GPU
 Significant testing/verification effort might be
required if GPU hardware have to be upgraded (due
to old one becoming obsolete).
 Proprietary nature of CUDA might be problematic in
case company discontinues CUDA or its support.
Future work
 CUDA kernels can be called in MATLAB code
using MATLAB’s CUDA kernel calling support.
 MATLAB GPU implementation can be improved
as newer and better functions become available.
 C/C++ based CPU implementation can be
developed to better judge MATLAB-CPU/CUDA
performance.
 Other SAR processing algorithms can be
implemented using framework laid out in this
project.
Q & A
Thank You

More Related Content

What's hot

SAR Interferometry Technique
SAR Interferometry TechniqueSAR Interferometry Technique
SAR Interferometry TechniqueNopphawanTamkuan
 
Introduction to sar-marjolaine_rouault
Introduction to sar-marjolaine_rouaultIntroduction to sar-marjolaine_rouault
Introduction to sar-marjolaine_rouaultNaivedya Mishra
 
Synthetic aperture radar
Synthetic aperture radarSynthetic aperture radar
Synthetic aperture radarMahesh pawar
 
Synthetic aperture radar
Synthetic aperture radar Synthetic aperture radar
Synthetic aperture radar Cigi Cletus
 
Introdution to Landsat and Google Earth Engine
Introdution to Landsat and Google Earth EngineIntrodution to Landsat and Google Earth Engine
Introdution to Landsat and Google Earth EngineVeerachai Tanpipat
 
Radar Presentation
Radar PresentationRadar Presentation
Radar PresentationAIM iTech
 
SBAS-DInSAR processing on the ESA Geohazards Exploitation Platform
SBAS-DInSAR processing on the ESA Geohazards Exploitation PlatformSBAS-DInSAR processing on the ESA Geohazards Exploitation Platform
SBAS-DInSAR processing on the ESA Geohazards Exploitation PlatformEmmanuel Mathot
 
Satellite communication
Satellite communicationSatellite communication
Satellite communicationaditi sehgal
 
Role of gis in telecommunications
Role of gis in telecommunicationsRole of gis in telecommunications
Role of gis in telecommunicationsAkhil Gupta
 
Global navigation satellite system based positioning combined
Global navigation satellite system based positioning   combinedGlobal navigation satellite system based positioning   combined
Global navigation satellite system based positioning combinedMehjabin Sultana
 
Dgps concept
Dgps conceptDgps concept
Dgps conceptegovindia
 
Global position system
Global position systemGlobal position system
Global position systemIqbal Khan
 
GeoServer Developers Workshop
GeoServer Developers WorkshopGeoServer Developers Workshop
GeoServer Developers WorkshopJody Garnett
 
Lidar- light detection and ranging
Lidar- light detection and rangingLidar- light detection and ranging
Lidar- light detection and rangingKarthick Subramaniam
 

What's hot (20)

SAR Interferometry Technique
SAR Interferometry TechniqueSAR Interferometry Technique
SAR Interferometry Technique
 
Glonass
GlonassGlonass
Glonass
 
Introduction to sar-marjolaine_rouault
Introduction to sar-marjolaine_rouaultIntroduction to sar-marjolaine_rouault
Introduction to sar-marjolaine_rouault
 
Synthetic aperture radar
Synthetic aperture radarSynthetic aperture radar
Synthetic aperture radar
 
Synthetic aperture radar
Synthetic aperture radar Synthetic aperture radar
Synthetic aperture radar
 
Introdution to Landsat and Google Earth Engine
Introdution to Landsat and Google Earth EngineIntrodution to Landsat and Google Earth Engine
Introdution to Landsat and Google Earth Engine
 
Lidar
LidarLidar
Lidar
 
Radar Presentation
Radar PresentationRadar Presentation
Radar Presentation
 
SBAS-DInSAR processing on the ESA Geohazards Exploitation Platform
SBAS-DInSAR processing on the ESA Geohazards Exploitation PlatformSBAS-DInSAR processing on the ESA Geohazards Exploitation Platform
SBAS-DInSAR processing on the ESA Geohazards Exploitation Platform
 
Satellite communication
Satellite communicationSatellite communication
Satellite communication
 
Gps ppt
Gps pptGps ppt
Gps ppt
 
GPS
GPSGPS
GPS
 
Role of gis in telecommunications
Role of gis in telecommunicationsRole of gis in telecommunications
Role of gis in telecommunications
 
Global navigation satellite system based positioning combined
Global navigation satellite system based positioning   combinedGlobal navigation satellite system based positioning   combined
Global navigation satellite system based positioning combined
 
Dgps concept
Dgps conceptDgps concept
Dgps concept
 
Global position system
Global position systemGlobal position system
Global position system
 
GeoServer Developers Workshop
GeoServer Developers WorkshopGeoServer Developers Workshop
GeoServer Developers Workshop
 
Lunar Laser Ranging
Lunar Laser RangingLunar Laser Ranging
Lunar Laser Ranging
 
GLONASS.pptx
GLONASS.pptxGLONASS.pptx
GLONASS.pptx
 
Lidar- light detection and ranging
Lidar- light detection and rangingLidar- light detection and ranging
Lidar- light detection and ranging
 

Viewers also liked

Parallel Processing for Digital Image Enhancement
Parallel Processing for Digital Image EnhancementParallel Processing for Digital Image Enhancement
Parallel Processing for Digital Image EnhancementNora Youssef
 
MATLAB_BIg_Data_ds_Haddop_22032015
MATLAB_BIg_Data_ds_Haddop_22032015MATLAB_BIg_Data_ds_Haddop_22032015
MATLAB_BIg_Data_ds_Haddop_22032015Asaf Ben Gal
 
Survieellance by dr najeeb
Survieellance by dr najeebSurvieellance by dr najeeb
Survieellance by dr najeebmuhammed najeeb
 
Current & Future Spaceborne Sar Systems
Current & Future Spaceborne Sar SystemsCurrent & Future Spaceborne Sar Systems
Current & Future Spaceborne Sar Systemsgpetrie
 
Digital image processing ppt
Digital image processing pptDigital image processing ppt
Digital image processing pptkhanam22
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel ProgrammingUday Sharma
 
Iceberg phenomena in dentistry
Iceberg phenomena in dentistryIceberg phenomena in dentistry
Iceberg phenomena in dentistrypratiklovehoney
 
WE3.L10.3: THE FUTURE OF SPACEBORNE SYNTHETIC APERTURE RADAR
WE3.L10.3: THE FUTURE OF SPACEBORNE SYNTHETIC APERTURE RADARWE3.L10.3: THE FUTURE OF SPACEBORNE SYNTHETIC APERTURE RADAR
WE3.L10.3: THE FUTURE OF SPACEBORNE SYNTHETIC APERTURE RADARgrssieee
 
Tesla personal super computer
Tesla personal super computerTesla personal super computer
Tesla personal super computerPriya Manik
 
Iceberg phenomena by dr najeeb memon
Iceberg phenomena by dr najeeb memonIceberg phenomena by dr najeeb memon
Iceberg phenomena by dr najeeb memonmuhammed najeeb
 
Introduction to Epidemiology and Surveillance
Introduction to Epidemiology and SurveillanceIntroduction to Epidemiology and Surveillance
Introduction to Epidemiology and SurveillanceGeorge Moulton
 
Writing Fast MATLAB Code
Writing Fast MATLAB CodeWriting Fast MATLAB Code
Writing Fast MATLAB CodeJia-Bin Huang
 
Study design in research
Study design in  research Study design in  research
Study design in research Kusum Gaur
 
Epidemiology And Public Health Part II for Graduate and Postgraduate students
Epidemiology And Public Health Part II for Graduate and Postgraduate studentsEpidemiology And Public Health Part II for Graduate and Postgraduate students
Epidemiology And Public Health Part II for Graduate and Postgraduate studentsTauseef Jawaid
 
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreLec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreHsien-Hsin Sean Lee, Ph.D.
 

Viewers also liked (20)

PresentationSAR
PresentationSARPresentationSAR
PresentationSAR
 
Parallel Processing for Digital Image Enhancement
Parallel Processing for Digital Image EnhancementParallel Processing for Digital Image Enhancement
Parallel Processing for Digital Image Enhancement
 
HPC Essentials 0
HPC Essentials 0HPC Essentials 0
HPC Essentials 0
 
MATLAB_BIg_Data_ds_Haddop_22032015
MATLAB_BIg_Data_ds_Haddop_22032015MATLAB_BIg_Data_ds_Haddop_22032015
MATLAB_BIg_Data_ds_Haddop_22032015
 
Survieellance by dr najeeb
Survieellance by dr najeebSurvieellance by dr najeeb
Survieellance by dr najeeb
 
Current & Future Spaceborne Sar Systems
Current & Future Spaceborne Sar SystemsCurrent & Future Spaceborne Sar Systems
Current & Future Spaceborne Sar Systems
 
Study design of Prof Zak
Study design of Prof ZakStudy design of Prof Zak
Study design of Prof Zak
 
Cuda tutorial
Cuda tutorialCuda tutorial
Cuda tutorial
 
Digital image processing ppt
Digital image processing pptDigital image processing ppt
Digital image processing ppt
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
 
Iceberg phenomena in dentistry
Iceberg phenomena in dentistryIceberg phenomena in dentistry
Iceberg phenomena in dentistry
 
WE3.L10.3: THE FUTURE OF SPACEBORNE SYNTHETIC APERTURE RADAR
WE3.L10.3: THE FUTURE OF SPACEBORNE SYNTHETIC APERTURE RADARWE3.L10.3: THE FUTURE OF SPACEBORNE SYNTHETIC APERTURE RADAR
WE3.L10.3: THE FUTURE OF SPACEBORNE SYNTHETIC APERTURE RADAR
 
Tesla personal super computer
Tesla personal super computerTesla personal super computer
Tesla personal super computer
 
Iceberg phenomena by dr najeeb memon
Iceberg phenomena by dr najeeb memonIceberg phenomena by dr najeeb memon
Iceberg phenomena by dr najeeb memon
 
Introduction to Epidemiology and Surveillance
Introduction to Epidemiology and SurveillanceIntroduction to Epidemiology and Surveillance
Introduction to Epidemiology and Surveillance
 
Matlab ppt
Matlab pptMatlab ppt
Matlab ppt
 
Writing Fast MATLAB Code
Writing Fast MATLAB CodeWriting Fast MATLAB Code
Writing Fast MATLAB Code
 
Study design in research
Study design in  research Study design in  research
Study design in research
 
Epidemiology And Public Health Part II for Graduate and Postgraduate students
Epidemiology And Public Health Part II for Graduate and Postgraduate studentsEpidemiology And Public Health Part II for Graduate and Postgraduate students
Epidemiology And Public Health Part II for Graduate and Postgraduate students
 
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreLec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
 

Similar to Design and implementation of GPU-based SAR image processor

transforming-wireless-system-design-with-matlab-and-ni.pdf
transforming-wireless-system-design-with-matlab-and-ni.pdftransforming-wireless-system-design-with-matlab-and-ni.pdf
transforming-wireless-system-design-with-matlab-and-ni.pdfJunaidKhan188662
 
Adaptive Neuro-Fuzzy Inference System (ANFIS) for segmentation of image ROI a...
Adaptive Neuro-Fuzzy Inference System (ANFIS) for segmentation of image ROI a...Adaptive Neuro-Fuzzy Inference System (ANFIS) for segmentation of image ROI a...
Adaptive Neuro-Fuzzy Inference System (ANFIS) for segmentation of image ROI a...IRJET Journal
 
byteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurationsbyteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurationsbyteLAKE
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Ganesh Raju
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterLinaro
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterLinaro
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Fisnik Kraja
 
Accelerating S3D A GPGPU Case Study
Accelerating S3D  A GPGPU Case StudyAccelerating S3D  A GPGPU Case Study
Accelerating S3D A GPGPU Case StudyMartha Brown
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architectureDhaval Kaneria
 
Stream Processing
Stream ProcessingStream Processing
Stream Processingarnamoy10
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUsiguazio
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 

Similar to Design and implementation of GPU-based SAR image processor (20)

transforming-wireless-system-design-with-matlab-and-ni.pdf
transforming-wireless-system-design-with-matlab-and-ni.pdftransforming-wireless-system-design-with-matlab-and-ni.pdf
transforming-wireless-system-design-with-matlab-and-ni.pdf
 
0507036
05070360507036
0507036
 
Adaptive Neuro-Fuzzy Inference System (ANFIS) for segmentation of image ROI a...
Adaptive Neuro-Fuzzy Inference System (ANFIS) for segmentation of image ROI a...Adaptive Neuro-Fuzzy Inference System (ANFIS) for segmentation of image ROI a...
Adaptive Neuro-Fuzzy Inference System (ANFIS) for segmentation of image ROI a...
 
Cliff sugerman
Cliff sugermanCliff sugerman
Cliff sugerman
 
Jg3515961599
Jg3515961599Jg3515961599
Jg3515961599
 
Bb4201362367
Bb4201362367Bb4201362367
Bb4201362367
 
byteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurationsbyteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurations
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...
 
Accelerating S3D A GPGPU Case Study
Accelerating S3D  A GPGPU Case StudyAccelerating S3D  A GPGPU Case Study
Accelerating S3D A GPGPU Case Study
 
Mantle for Developers
Mantle for DevelopersMantle for Developers
Mantle for Developers
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
Stream Processing
Stream ProcessingStream Processing
Stream Processing
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
 
DSP Processor
DSP Processor DSP Processor
DSP Processor
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
DSP by FPGA
DSP by FPGADSP by FPGA
DSP by FPGA
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 

Recently uploaded

UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 

Recently uploaded (20)

UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 

Design and implementation of GPU-based SAR image processor

  • 1. Najeeb Ahmad Master Thesis Presentation May, 2012 Supervisor: Dr. Sun Jinping Design and Implementation of GPU based SAR Image Processor School of Electronic Information Engineering Beihang University, Beijing China.
  • 2. Contents 1. Introduction 2. GPU Computing 3. SAR Processing 4. Implementation 5. Conclusion & Future Work
  • 4. PROBLEM Synthetic Aperture Radar data processing is a computationally intensive and time consuming task using conventional CPUs. Given the increasing popularity and use of GPU for scientific computing, it is required to accelerate simplified range Doppler SAR processing algorithm on GPU using modern GPGPU technology to achieve real/near real-time performance and to evaluate its suitability for SAR processing.
  • 5. MOTIVATION  Computationally intensive and time consuming nature of SAR processing algorithms.  Inherent algorithm parallelism in most SAR processing algorithms.  Advent of modern GPGPU technology and availability of commodity GPUs as general purpose computation engines.  Architectural parallelism and availability of sufficient hardware resources in modern GPUs rendering them especially useful for handling large data quantities and parallel SAR algorithm implementation.
  • 6. OBJECTIVE  To implement and accelerate simplified range Doppler SAR processing algorithm on a modern NVIDIA TESLA GPU using CUDA and MATLAB- GPU capabilities.  The resulting research will explore the areas like:  Algorithm adaptation for parallel implementation.  Suitability of MATLAB for algorithm implementation.  Suitability of CUDA for algorithm implementation.  Comparison of CPU/CUDA/MATLAB-GPU implementations.  GPU as SAR processing platform.
  • 7. METHODOLOGY  Algorithm implementation and verification on Intel Xeon CPU using MATLAB.  Identification of parallelizable portions of algorithm.  Algorithm implementation on TESLA C1060 GPU using MATLAB’s native GPU capabilities.  Algorithm implementation on TESLA C1060 GPU using CUDA.  Analysis of CPU, MATLAB-GPU and CUDA implementations.
  • 8. 2.GPU Computing  Introduction to GPU Computing  GPGPU: Brief History  NVIDIA CUDA  Writing efficient code
  • 9. Introduction to GPU Computing  Use of Graphics Processing Units (GPUs) for general purpose computing applications.  CPU: Single, four or eight cores. Capable of handling few threads. Suitable for serial code.  GPU: Hundreds of cores. Capable of handling hundreds of threads. Suitable for parallel code.
  • 10. Introduction to GPU Computing  GPU Computing Model: Heterogeneous computing model employing both CPU and GPU with serial computing on CPU, parallel computing on GPU.
  • 11. GPGPU: Brief History  First use of GPU as general purpose computing device, around 1999-2000 using graphics APIs. Huge performance boosts observed. Generally unpopular due to tedious programming.  Introduction of NVIDIAs “CUDA” and AMDs “Stream Computing” in 2007. Beginning of modern GPGPU era. Other vendors introduced their own GPGPU systems.  NVIDIAs CUDA gaining popularity due to its maturity and performance.
  • 12. NVIDIA CUDA  Compute Unified Device Architecture.  Comprises of Instruction Set Architecture (ISA) and parallel compute engine in GPU programmable with high level languages extended for GPU computing.  CUDA framework comprises of two parts; hardware and software. From software perspective, CUDA means extended C/C++, FORTRAN to support GPU computing.  CUDA is “Single Instruction Multiple Thread” (SIMT) architecture.
  • 13. CUDA Hardware  Streaming multiprocessor (SM): Basic computing unit of the GPU. Comprises of eight streaming processors (SP) and memory. Different GPUs differ in number of SMs and SP clock frequency. SP SP SP SP SP SP SP SP SFU SFU MT IU Shared Memory
  • 14. CUDA Memory Architecture  Understanding of memory architecture critical for writing efficient CUDA programs.  All CUDA-enabled hardware have following types of memory:  Global memory  Shared memory and registers.  Texture memory and texture cache.  Constant memory and constant cache.  Local memory for register spilling. SP SP Shared memory SP SP SP Texture cache Constant cache SM n SP SP Shared memory SP SP SP Texture cache Constant cache SM 3 SP SP Shared memory SP SP SP Texture cache Constant cache SP SP Shared memory SP SP SP Texture cache Constant cache SM 1 SM 2 GPU Global memory (RAM) Local MemoryTexture memory Constant memory
  • 15. NVIDIA TESLA C1060 GPU  PCI Express 2.0 compliant computing processor board based on NVIDIA Tesla T10 graphics processing unit targeted for HPC applications.  Feature highlights  30 SMs = 240 SPs.  SP Clock = 1.296 GHz  4 GB DDR3 memory with 120 GB/s bandwidth.  IEEE 754 single and double floating point compliant.  933 GFLOPS single and 78 GFLOPS double precision performance.  Compute capability: 1.3  Supported by MATLAB for GPU computing
  • 16. CUDA Programming Model  At its core are thread groups, shared memory and barrier synchronization.  Provides coarse-grained data and task parallelism and fine-grained data and thread parallelism providing expressivity and scalability.  Thread hierarchy: Grid, blocks, threads.  Kernels: Functions executed on device (GPU) in parallel threads.  CUDA provides APIs to run and launch kernels in parallel threads and to synchronize them.
  • 17. Processing Flow  Copy input data from CPU to GPU memory.  Load GPU program and execute, caching result on the device.  Copy results from GPU to CPU. RAM CPU Host Global memory Constant Texture GPU Device
  • 18. Writing Efficient Code  High priority considerations  Minimum CPU-GPU transfers.  Use of coalesced data transfers.  Use of shared memory instead of global memory whenever possible.  Avoiding different execution paths within a warp.  Medium priority considerations  Access to shared memory should be planned to avoid serialization.  Redundant data transfers from global memory should be avoided.
  • 19. Writing Efficient Code  Threads per block should be multiple of 32.  Use of fast math library whenever possible.  Low Priority Considerations  Use of zero copy operations.  For kernels with long argument list, some argument should be placed in constant memory.  Expensive modulo, division operations should be avoided in favor of shift operations whenever possible.  Automatic conversion of double to float should be avoided.  Loop unrolling should be used whenever possible.
  • 20. 3.SAR Processing  What is Synthetic Aperture Radar  SAR Processing  Processing Algorithms  Basic RDA  Simplified RDA
  • 21. What is Synthetic Aperture Radar  An active microwave remote sensing imaging system.  Employs long range propagation characteristics of radar and complex signal processing techniques to produce high resolution images.  High resolution achieved by synthesizing long antenna aperture through signal processing techniques.  Pros (in comparison with optical systems):  All weather and day and night operation.  No effects of constituents of atmosphere.  Sensitivity to dielectric properties (can image ice, biomass etc.)  Sensitivity to surface roughness (oceans, wind
  • 22. What is Synthetic Aperture Radar  Accurate measurement of distance.  Sensitivity to man made objects.  Sensitivity to target structure.  Subsurface penetration.  Cons  Complex interactions (difficult to visualize and understand)  Speckle effects (difficult in visual interpretation)  Topographic effects
  • 23. SAR Processing  A set of procedures to obtain interpretable image from raw scattered in azimuth and range directions.  In range, data is scattered by duration of transmitted FM pulse.  In azimuth, data spread by duration point target is illuminated by the radar beam.  SAR processing compresses this data taking into account range cell migration, earth curvature, earth rotation, air/spacecraft attitude noise to produce the final image.  Given nature of SAR system and signals, signal processing rather than image processing provide appropriate tools for SAR processing.
  • 24. SAR Processing Algorithms  Mainstream SAR processing include:  Range Doppler algorithm (RDA)  High resolution images for low squint and for relatively smaller aperture sizes. Very popular.  Chirp scaling algorithm (CSA)  Two-dimensional operations with range independence followed by range corrections in range Doppler domain.  Omega-K algorithm (ωKA)  Efficient and accurate in two-dimensional frequency domain.  SPECAN algorithm  Good for medium to low resolution requirements.
  • 25. Range Doppler Algorithm  Versions of range Doppler:  Basic RDA  RDA with accurate SRC  RDA with approximate SRC  Simplified range Doppler
  • 26. Basic RDA Raw data Range Compression Azimuth FFT RCMC Azimuth Compression Azimuth IFFT and lookup Summation Final Image Range FFT, matched filter multiply, range IFFT Data in range Doppler domain Interpolation operation in range Doppler domain Azimuth matched filter multiply To bring back signal into time domain.
  • 27. Simplified RDA  For narrower swath width and medium resolution requirements, RCM can be assumed independent of range.Raw data Pre-filtering Range Compression Azimuth FFTRCMCRange IFFT Azimuth Compression Azimuth IFFT and lookup Summation Final Image To remove Doppler centroid Range FFT, matched filter multiply (No range IFFT) Both range and azimuth in frequency domain RCM phase function multiply with each range line Data in range Doppler domain
  • 28. 4.Implementation  Hardware resources  Software resources  CPU Implementation  MATLAB GPU Implementation  CUDA Implementation  Result Comparison
  • 29. Hardware resources CPU GPU Name NVIDIA Tesla C1060 # of cores 240 SP Clock 1.296 GHz Memory 4 GB GDDR3 Maximum memory bandwidth 102 GB/s Memory interface 512 bit – PCI Express GFLOPS 933 single precision, 78 double precision Name Intel Xeon E5504 CPU Clock 2 GHz # of cores 4 System Memory 4 GB DDR3 Clock 800 MHz Maximum memory bandwidth 19.2 GB/s Memory type DDR3 PC3 PCI Slot PCI Express
  • 30. Software resources CPU GPU  Windows 7 Ultimate 64-bit  MATLAB release 2010b  Visual Studio 2008 SP1  CUDA Toolkit 4.1  MATLAB release 2010b  NVIDIA Parallel Nsight  Visual Profiler  CUDA MEMCHECK  CUFFT library
  • 31. RADARSAT – I Data • CEOS Format • Raw data is required to be extracted from CEOS data before SAR processing algorithm can be applied. Parameter Value Units Sampling rate 32.317 MHz Range FM rate 0.7213 5 MHz/µs Pulse duration 41.74 µs Radar frequency 5.3 GHz Radar wavelength 0.0565 7 m Pulse repetition frequency 1256.9 8 Hz Effective radar velocity 7062 m/s Azimuth FM rate 1733 Hz/s Table RADARSAT – I data parameters CEOS data CEOS data extraction utility RAW SAR data
  • 32. SAR Processing GUI Functions • CEOS data extraction. • MATLAB- CPU SAR processing. • MATLAB- GPU SAR processing • CUDA input/output manipulation. • CUDA program execution.
  • 33. CPU Implementation  Implemented using MATLAB  FFT/IFFT using standard MATLAB functions
  • 34. CPU Processed SAR image A 2048 x 4096 SAR image using CPU based implementation
  • 35. MATLAB-GPU Implementation  MATLAB started supporting GPU computing since MATLAB release 2010b.  Implemented using native MATLAB-GPU functions only (no CUDA kernel calls).  Vectorization strategy employed to implement vector- matrix multiplications on GPU.  All FFT/IFFTs performed using MATLAB-GPU FFT/IFFT support functions. Column1 Column2 ………... Columnn Column1 Column2 ………... Columnn Column1 Column2 ………... Columnn
  • 36. MATLAB-GPU Implementation  Limit on maximum image size that can be calculated due to GPU memory constraints.
  • 37. MATLAB-GPU Implementation  Speedup as high as 21 achieved compared with CPU implementation
  • 38. MATLAB-GPU Implementation A 2048 x 4096 SAR image using MATLAB-GPU based implementation
  • 39. MATLAB-GPU Implementation  Advantages  Quick and easy to implement  Sufficient speedups obtained with little effort  Little knowledge of GPU hardware and no knowledge of optimization techniques required.  Disadvantages  Currently, limited number of MATLAB functions supported on GPU.  Not all overloads of a function available for GPU.  Lesser control of hardware resources and memory.  Not many optimization options.
  • 40. CUDA Implementation  Strategy  Signal data read as binary file  Vectors, matched filters calculated on CPU  Vectors/signal data transferred to GPU  Following kernels executed in order on GPU  Pre-filtering kernel  Range compression kernel  RCMC kernel  Azimuth compression kernel  Image pixel calculation kernel  Data transferred from GPU to CPU and saved on disk as image.
  • 41. Optimization considerations  Chosen block size = 8 × 8 = 64. Conforms with memory coalescing requirements.  Constant variables stored in constant memory  Local variable and phase function calculation whenever possible to reduce global memory access.  CPU-GPU data transfer kept to minimum by transferring data from CPUGPU at beginning and GPUCPU transfers at the end of algorithm.  Using CUFFTs cufftPlanMany() plan for FFT/IFFTs along data columns.
  • 42. CUDA Implementation Results A 2048 x 4096 SAR image using CUDA based implementation
  • 47. MATLAB-GPU/CUDA speedup comparison  Speedups as high as 53 times achieved in comparison with maximum speedup of 21 times in MATLAB.
  • 48. 5. Conclusions & Future Work
  • 49. Conclusions  Feasibility of GPU for SAR processing  Amount of data, computational effort and inherent algorithm parallelism makes SAR processing suitable on GPU.  TESLA C1060 GPU offers enough memory to handle various common SAR image sizes.  Cooling GPU may be a challenge in some environments.  Scalability of CUDA will prove to be an advantage to port existing SAR code to newer GPUs.  GPUs might not be suitable where customizable hardware is required or military hardware standards are to be adhered.
  • 50. Conclusions  MATLAB-GPU based SAR Processing  Significant speedups compared with CPU.  Quick and easy to implement.  Has some limitations:  Currently have lesser function support for GPU. Expected to improve with future MATLAB releases.  Vectorization strategy needs more memory. Future release promise to take away need for vectorization (e.g. bsxfun in release 2012a).  Lesser control over GPU resources (memory etc.).  CUDA SAR Processing  CUDA: Flexible and scalable with least learning curve.  More control over GPU resources.  Optimization strategies can be applied.  Faster and more memory efficient than MATLAB implementation.
  • 51. Conclusions  Downsides of GPU  Significant testing/verification effort might be required if GPU hardware have to be upgraded (due to old one becoming obsolete).  Proprietary nature of CUDA might be problematic in case company discontinues CUDA or its support.
  • 52. Future work  CUDA kernels can be called in MATLAB code using MATLAB’s CUDA kernel calling support.  MATLAB GPU implementation can be improved as newer and better functions become available.  C/C++ based CPU implementation can be developed to better judge MATLAB-CPU/CUDA performance.  Other SAR processing algorithms can be implemented using framework laid out in this project.
  • 53. Q & A