Survey of using GPU CUDA
programming model in medical
image analysis
Armin shoughi
May 2019
1
Introduction
Computed tomography (CT) , magnetic resonance imaging (MRI) , positron emission
tomography (PET) and ultrasound are famous medical modalities that produce the 2D,
3D and 4D types of medical images which are guiding the diagnosis process and
treatment planning.
GPU is highly parallel, multithread, multiple core processors and has high memory
bandwidth to give the solution to the computational problems.
2
Overview of GPU computing model – CUDA
3
• NVIDIA has introduced its own massively parallel architecture called
compute unified device architecture (CUDA) in 2006 and made the
evolution in GPU programming model.
• CUDA is an open source and extension of the C programming
language.
• CUDA program contains two phases that are executed in either host
(CPU) or device (GPU).
• There is no data parallelism in the host code.
GPU - CUDA architecture
1. programming model
2. memory model
3. CUDA work flow
GPU - CUDA programming model
5
GPU - CUDA hardware builds with three main parts to utilize
effectively the full computation capability of GPU. The grids,
blocks and threads build the CUDA architecture.
GPU- CUDA memory model
6
A GPU has M number of streaming multiprocessors (SM) and N number
of streaming processor cores (SPs) to each SM. Each thread can
access variables from the local memory and registers. Registers have the
largest bandwidth and frequently accessed variables are stored in the
registers.
CUDA work flow model
7
The CUDA program starts with host execution. The kernel function
generates the large amount of threads to execute data parallelism.
Before starting the kernel, all the necessary data is transferred from
host to allocated device memory.
GPU computation for medical
image analysis
• denoising
• Registration
• Segmentation
• visualization
Image denoising
9
Image denoising is an important task in medical imaging
applications in order to enhance and recover hidden details from
the data.
Solution of this problem may lead to improve the diagnosis and
surgical procedures.
The most commonly used denoising algorithms in the medical
domain are :
• adaptive filtering
• anisotropic diffusion
• bilateral filtering
• non-local means filter
Adaptive filtering
10
The denoising approach uses an adaptive filter introduced by
Knutsson et al., in 1983
The adaptive filter is a self-modifying digital filter that adjusts its filter
coefficients in an attempt to minimize an error function.
Adaptive filter is a directed method that does not need to be
iterated.
Anisotropic diffusion
11
Anisotropic diffusion filter is an iterative algorithm introduced by
Perona and Malik in 1987.
The algorithm aims at reducing image noise without affecting
significant parts of the image content, edges, regions, lines or other
details .
Registration
12
The term medical image registration determines the
spatial alignment between reference image and spatial
transformed image.
The reference image and transformed image have
acquired from same or different modalities.
Two popular registration algorithms are :
• block matching algorithm (BMA)
• rigid transformation estimation (RTE).
Block matching algorithm
13
The block-matching algorithm (BMA) is the most popular
method for the motion estimation from the image sequence.
This method splits an image into blocks and estimates the
displacement to each block.
Rigid transformation estimation
14
Rigid Transformation Estimation (RTE) is one of the simplest forms of image
registration in the medical imaging. RTE allows finding the transformation.
between references and moving images with the support of vectors given by
the BMA. A rigid transformation (T) represents the linear and/or angular
displacement of a rigid body.
Segmentation
15
Many segmentation methods are computationally expensive
while running on large amount of dataset produced by the
medical modalities.
Image segmentation in medical imaging is often used to
segment brain structures, blood vessels, tumors and bones.
The famous segmentation methods :
• thresholding
• region growing
Thresholding
16
Thresholding is a process to segment each pixel or voxel
using one or more threshold values. Thresholding is a
simplest technique to implement the data parallelism using
voxel per thread in 3D image or pixel per thread in 2D
image.
Region growing
17
Region growing is commonly used medical image
segmentation technique. Region growing starts with initial
seed point from object which is given by either manually or
automatically using prior knowledge.
Visualization
18
Medical image processing combined with visualization makes new
way to diagnoses and to evaluate the effect of treatment given to
the patient more accurate and reliable by using computers.
Image visualization categorized into two groups:
• surface rendering
• volume rendering

Survey of using gpu cuda programming model

  • 1.
    Survey of usingGPU CUDA programming model in medical image analysis Armin shoughi May 2019 1
  • 2.
    Introduction Computed tomography (CT), magnetic resonance imaging (MRI) , positron emission tomography (PET) and ultrasound are famous medical modalities that produce the 2D, 3D and 4D types of medical images which are guiding the diagnosis process and treatment planning. GPU is highly parallel, multithread, multiple core processors and has high memory bandwidth to give the solution to the computational problems. 2
  • 3.
    Overview of GPUcomputing model – CUDA 3 • NVIDIA has introduced its own massively parallel architecture called compute unified device architecture (CUDA) in 2006 and made the evolution in GPU programming model. • CUDA is an open source and extension of the C programming language. • CUDA program contains two phases that are executed in either host (CPU) or device (GPU). • There is no data parallelism in the host code.
  • 4.
    GPU - CUDAarchitecture 1. programming model 2. memory model 3. CUDA work flow
  • 5.
    GPU - CUDAprogramming model 5 GPU - CUDA hardware builds with three main parts to utilize effectively the full computation capability of GPU. The grids, blocks and threads build the CUDA architecture.
  • 6.
    GPU- CUDA memorymodel 6 A GPU has M number of streaming multiprocessors (SM) and N number of streaming processor cores (SPs) to each SM. Each thread can access variables from the local memory and registers. Registers have the largest bandwidth and frequently accessed variables are stored in the registers.
  • 7.
    CUDA work flowmodel 7 The CUDA program starts with host execution. The kernel function generates the large amount of threads to execute data parallelism. Before starting the kernel, all the necessary data is transferred from host to allocated device memory.
  • 8.
    GPU computation formedical image analysis • denoising • Registration • Segmentation • visualization
  • 9.
    Image denoising 9 Image denoisingis an important task in medical imaging applications in order to enhance and recover hidden details from the data. Solution of this problem may lead to improve the diagnosis and surgical procedures. The most commonly used denoising algorithms in the medical domain are : • adaptive filtering • anisotropic diffusion • bilateral filtering • non-local means filter
  • 10.
    Adaptive filtering 10 The denoisingapproach uses an adaptive filter introduced by Knutsson et al., in 1983 The adaptive filter is a self-modifying digital filter that adjusts its filter coefficients in an attempt to minimize an error function. Adaptive filter is a directed method that does not need to be iterated.
  • 11.
    Anisotropic diffusion 11 Anisotropic diffusionfilter is an iterative algorithm introduced by Perona and Malik in 1987. The algorithm aims at reducing image noise without affecting significant parts of the image content, edges, regions, lines or other details .
  • 12.
    Registration 12 The term medicalimage registration determines the spatial alignment between reference image and spatial transformed image. The reference image and transformed image have acquired from same or different modalities. Two popular registration algorithms are : • block matching algorithm (BMA) • rigid transformation estimation (RTE).
  • 13.
    Block matching algorithm 13 Theblock-matching algorithm (BMA) is the most popular method for the motion estimation from the image sequence. This method splits an image into blocks and estimates the displacement to each block.
  • 14.
    Rigid transformation estimation 14 RigidTransformation Estimation (RTE) is one of the simplest forms of image registration in the medical imaging. RTE allows finding the transformation. between references and moving images with the support of vectors given by the BMA. A rigid transformation (T) represents the linear and/or angular displacement of a rigid body.
  • 15.
    Segmentation 15 Many segmentation methodsare computationally expensive while running on large amount of dataset produced by the medical modalities. Image segmentation in medical imaging is often used to segment brain structures, blood vessels, tumors and bones. The famous segmentation methods : • thresholding • region growing
  • 16.
    Thresholding 16 Thresholding is aprocess to segment each pixel or voxel using one or more threshold values. Thresholding is a simplest technique to implement the data parallelism using voxel per thread in 3D image or pixel per thread in 2D image.
  • 17.
    Region growing 17 Region growingis commonly used medical image segmentation technique. Region growing starts with initial seed point from object which is given by either manually or automatically using prior knowledge.
  • 18.
    Visualization 18 Medical image processingcombined with visualization makes new way to diagnoses and to evaluate the effect of treatment given to the patient more accurate and reliable by using computers. Image visualization categorized into two groups: • surface rendering • volume rendering