SlideShare a Scribd company logo
Computing Using
Graphics Cards

Shree Kumar, Hewlett Packard
http://www.shreekumar.in/
Speaker Intro

• High Performance Computing @ Hewlett‐Packard
  – VizStack (http://vizstack.sourceforge.net)
  – GPU Computing
• Big 3D enthusiast
• Travels a lot
• Blogs at http://www.shreekumar.in/
What we will cover

•   GPUs and their history
•   Why use GPUs
•   Architecture
•   Getting Started with GPU Programming
•   Challenges, Techniques & Pitfalls
•   Where not to use GPUs ?
•   Resources
•   The Future
What is a GPU

• Graphics Programming Unit
   – Coined in 1999 by NVidia
   – Specialized add‐on board
• Accelerates interactive 3D rendering
   – 60 image updates (or more) on large data
   – Solves embarrassingly parallel problem
   – Game driven volume economics
       • NVidia v/s ATI, just like Intel v/s AMD
• Demand for better effects led to
   – programmable GPUs
   – floating point capabilities
   – this led to General Purpose GPU(GPGPU) Computation
History of GPUs : a GPGPU Perspective
Date Product               Trans       Cores Flops             Technology

1997   RIVA 128            3 M                                 Rasterization
1999 GeForce 256           25 M                                Transform & Lighting
2001   GeForce 3           60 M                                Programmable shaders
2002 GeForce FX            125 M                               16, 32 bit FP, long shaders
2004 GeForce 6800 222 M                                        Infinite length shaders, branching
2006 GeForce 8800 681 M 128                                    Unified graphics & compute, CUDA, 
                                                               64 bit FP
2008 GeForce GTX           1.4 B       240        933 G        IEEE FP, CUDA C, OpenCL and 
     280                                          78 M         DirectCompute, PCI‐express Gen 2
2009 Tesla M2050           3.0 B       512        1.03 T       Improved 64 bit perf, caching, ECC 
                                                  515 G        memory, 64‐bit unified addressing, 
                                                               asynchronous bidirectional data 
                                                               transfer, multiple kernels
          Source : Nickolls J. , Dally W.J. “The GPU Computing Era”, IEEE Micro, March-April 2010
The GPU Advantage




  30x CPU FLOPS on Latest GPUs                10x Memory Bandwidth




                                                  Add to these a
                                                 3x Performance/$


Energy Efficient : 5x Performance/Watt
                                         All Graphs From: GPU4Vision : http://gpu4vision.icg.tugrz.at/
People use GPUs for…




    Source : Nickolls J. , Dally W.J. “The GPU Computing Era”, IEEE Micro, March-April 2010
More “why to use GPUs”

• Proliferation of GPUs
   – Mobile devices will have capable GPUs soon !
• Make more things possible
   – Make things real‐time
      • From seconds to real‐time interactive performance
   – Reduce offline processing overhead
• Research Opportunities
   – New & efficient algorithms
   – Pairing Multi‐core CPUs and massively multi‐threaded 
     GPUs
GPU Computing 1‐2‐3


A GPU isn’t a CPU replacement!
GPU Computing 1‐2‐3


There ain’t no such thing as a FREE Lunch!
GPU Computing 1‐2‐3


You don’t always “port” a CPU algorithm to a GPU!
CPU versus GPU

• CPU
  – Optimized for latency
  – Speedup techniques
     • Vectorization (MMX, SSE, …)
     • Coarse Grained Parallelism using multiple CPUs and cores
  – Memory approaching a TB
• GPU
  – Optimized for throughput
  – Speedup techniques
     • Massive multithreading
     • Fine grained parallelism
  – A few GBs of memory max
Getting Started

• Software
  – CUDA (NVidia specific)
  – OpenCL (Cross‐platform, GPU/CPU)
  – DirectCompute (MS specific)
• Hardware
  – A system equipped with GPU
• OS no bar
  – But Windows, RedHat Enterprise Linux seem better 
    supported
CUDA
• Compute Unified Device 
  Architecture
• Most popular GPGPU toolkit
• CUDA C extends C with 
  constructs
     – Easy to write programs
•   Lower level “driver” API is 
    available
                                        Source: NVIDIA CUDA Architecture, Introduction and Overview
     – Provides more control
     – Use multiple GPUs in the same 
       application
     – Mix graphics & compute code
•   Language bindings available
     – PyCUDA, Java, .NET
•   Toolkit provides conveniences


                                                                CUDA Toolkit
CUDA Architecture
• 1 more streaming 
  multiprocessors (“cores”)
• Thread Blocks
   – Single Instruction, Multiple 
     Thread (SIMT)
   – Hide latency by parallelism
• Memory Hierarchy
   – Fermi GPUs can access 
     system memory
• Primitives for
   – Thread synchronization
   – Atomic Operations on 
     memory


                                     Source : The GPU Computing Era
Simple Example : Vector Addition
C/C++ ‐ serial code
void VecAdd(const float *A, const float*B, float *C, int N) {
  for(unsigned int i=0;i<N;i++)
    C[i]=A[i]+B[i];
}
VecAdd(A,B,C,N);




C/C++ with OpenMP – thread level parallelism
void VecAdd(const float *A, const float*B, float *C, int N) {
  #pragma omp for
  for(unsigned int i=0;i<N;i++)
    C[i]=A[i]+B[i];
}
VecAdd(A,B,C,N);
Vector Addition using CUDA
CUDA C – element level parallelism
__global__ void VecAdd(const float *A, const float*B, float *C, int N) {
  int I = blockDim.x * blockIdx.x + threadIdx.x;
  if(i<N)
    C[i]=A[i]+B[i];
}


Invoking the function
cudaMalloc((void**)&d_A, size);
                                                                       Allocate Memory on GPU
cudaMalloc((void**)&d_B, size);
cudaMalloc((void**)&d_C, size);
cudaMemcpy(d_A, A, size, cudaMemcpyHostToDevice);                            Copy Arrays to GPU
cudaMemcpy(d_B, B, size, cudaMemcpyHostToDevice);
int threadsPerBlock = 256;
int blocksPerGrid = (N + threadsPerBlock - 1) / threadsPerBlock;                Invoke function
VecAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, N);
cudaMemcpy(C, d_C, size, cudaMemcpyDeviceToHost);
                                                               Copy Result Back to Main Memory
cudaFree(d_A);
cudaFree(d_B);
                                                                              Free GPU Memory
cudaFree(d_C);


Compilation
# nvcc vectorAdd.cu –I ../../common/inc
GPU Programming Challenges

• Need high “occupancy” for best performance
• Extracting parallelism with limited resources
  – Limited Registers
  – Limited Shared Memory
• Preferred Approach
  – Small Kernels
  – Multiple Passes if needed
• Decompose Problem into Parallel Pieces
  – Write once, scale perform everywhere!
GPU Programming

• Use Shared Memory when possible
   – Cooperation between threads in a block
   – Reduce access to global memory
• Reduce Data Transfer over the Bus
• It’s still a GPU !
   – use textures to your advantage
   – use vector data types if you can
• Watch out for GPU capability differences!
Enough Theory!

          Demo Time
              &
Let’s do some programming 
Watch out for

• Portability of programs across GPUs
   – Capabilities vary from GPU to GPU
   – Memory usage
• Arithmetic differences in the result
• Pay careful attention to demos…
Resources

• CUDA
  – Tools on NVIDIA Developer Site 
    http://developer.nvidia.com/object/gpucomputing.html
  – CUDPP 
    http://code.google.com/p/cudpp/
• OpenCL
• Google Search !
The Future

• Better throughput
   – More GPU cores, scaling by Moore’s law
   – PCIe Gen 3
• Easier to program
• Arbitrary control and data access patterns
Questions ?

shree.shree@gmail.com

More Related Content

What's hot

Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDA
Raymond Tay
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)
Rob Gillen
 
Cuda
CudaCuda
Cuda introduction
Cuda introductionCuda introduction
Cuda introductionHanibei
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAPiyush Mittal
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaRob Gillen
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
Dhaval Kaneria
 
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012DefCamp
 
Introduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : NotesIntroduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : Notes
Subhajit Sahu
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computing
bakers84
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
Jax Jargalsaikhan
 
CUDA Architecture
CUDA ArchitectureCUDA Architecture
CUDA Architecture
Dr Shashikant Athawale
 
AI Hardware Landscape 2021
AI Hardware Landscape 2021AI Hardware Landscape 2021
AI Hardware Landscape 2021
Grigory Sapunov
 

What's hot (18)

Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDA
 
Cuda tutorial
Cuda tutorialCuda tutorial
Cuda tutorial
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)
 
Cuda Architecture
Cuda ArchitectureCuda Architecture
Cuda Architecture
 
Cuda
CudaCuda
Cuda
 
Cuda introduction
Cuda introductionCuda introduction
Cuda introduction
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDA
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
 
Introduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : NotesIntroduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : Notes
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computing
 
Cuda intro
Cuda introCuda intro
Cuda intro
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
 
CUDA Architecture
CUDA ArchitectureCUDA Architecture
CUDA Architecture
 
Tech Talk NVIDIA CUDA
Tech Talk NVIDIA CUDATech Talk NVIDIA CUDA
Tech Talk NVIDIA CUDA
 
AI Hardware Landscape 2021
AI Hardware Landscape 2021AI Hardware Landscape 2021
AI Hardware Landscape 2021
 

Viewers also liked

Extending Android with New Devices
Extending Android with New DevicesExtending Android with New Devices
Extending Android with New Devices
Shree Kumar
 
Switching on the fibre staff present 2010
Switching on the fibre staff present 2010Switching on the fibre staff present 2010
Switching on the fibre staff present 2010moranf
 
Calendario 3divisao hoquei
Calendario 3divisao hoqueiCalendario 3divisao hoquei
Calendario 3divisao hoquei
Mané Castilho
 
Shop and Awe
Shop and AweShop and Awe
Shop and Awe
Adriana Young
 
Breakfast with Beatrice by Andrea Olausson
Breakfast with Beatrice by Andrea OlaussonBreakfast with Beatrice by Andrea Olausson
Breakfast with Beatrice by Andrea OlaussonAdriana Young
 
Using Social Media for Nonprofits
Using Social Media for NonprofitsUsing Social Media for Nonprofits
Using Social Media for NonprofitsProfiles, Inc.
 
Verification 2006 1
Verification 2006 1Verification 2006 1
Verification 2006 1
hjbarten
 
台北縣政府農業局簡介_final
台北縣政府農業局簡介_final台北縣政府農業局簡介_final
台北縣政府農業局簡介_final
Nancy Xiao
 
Livro 3 leitura sem simbolo
Livro 3   leitura sem simboloLivro 3   leitura sem simbolo
Livro 3 leitura sem simbolo
eliane santos
 
Contract and its assential
Contract and its assentialContract and its assential
Contract and its assentialspicysugar
 
A horror story about me
A horror story about meA horror story about me
A horror story about meManohar Patil
 
Il mercato pubblicitario in un contesto postmoderno
Il mercato pubblicitario in un contesto postmodernoIl mercato pubblicitario in un contesto postmoderno
Il mercato pubblicitario in un contesto postmoderno
pginzaina
 
Diseño de envases, packaging
Diseño de envases, packagingDiseño de envases, packaging
Diseño de envases, packaging
Nieves dibujo
 
How i built my own irrigation controller
How i built my own irrigation controllerHow i built my own irrigation controller
How i built my own irrigation controller
Shree Kumar
 
Android Service Patterns
Android Service PatternsAndroid Service Patterns
Android Service PatternsShree Kumar
 
Android, without batteries
Android, without batteriesAndroid, without batteries
Android, without batteries
Shree Kumar
 
Woning in spanje
Woning in spanjeWoning in spanje
Woning in spanje
OsirisRojales
 

Viewers also liked (18)

Extending Android with New Devices
Extending Android with New DevicesExtending Android with New Devices
Extending Android with New Devices
 
Switching on the fibre staff present 2010
Switching on the fibre staff present 2010Switching on the fibre staff present 2010
Switching on the fibre staff present 2010
 
Calendario 3divisao hoquei
Calendario 3divisao hoqueiCalendario 3divisao hoquei
Calendario 3divisao hoquei
 
Shop and Awe
Shop and AweShop and Awe
Shop and Awe
 
Breakfast with Beatrice by Andrea Olausson
Breakfast with Beatrice by Andrea OlaussonBreakfast with Beatrice by Andrea Olausson
Breakfast with Beatrice by Andrea Olausson
 
Using Social Media for Nonprofits
Using Social Media for NonprofitsUsing Social Media for Nonprofits
Using Social Media for Nonprofits
 
Verification 2006 1
Verification 2006 1Verification 2006 1
Verification 2006 1
 
Passie voor Oranje
Passie voor OranjePassie voor Oranje
Passie voor Oranje
 
台北縣政府農業局簡介_final
台北縣政府農業局簡介_final台北縣政府農業局簡介_final
台北縣政府農業局簡介_final
 
Livro 3 leitura sem simbolo
Livro 3   leitura sem simboloLivro 3   leitura sem simbolo
Livro 3 leitura sem simbolo
 
Contract and its assential
Contract and its assentialContract and its assential
Contract and its assential
 
A horror story about me
A horror story about meA horror story about me
A horror story about me
 
Il mercato pubblicitario in un contesto postmoderno
Il mercato pubblicitario in un contesto postmodernoIl mercato pubblicitario in un contesto postmoderno
Il mercato pubblicitario in un contesto postmoderno
 
Diseño de envases, packaging
Diseño de envases, packagingDiseño de envases, packaging
Diseño de envases, packaging
 
How i built my own irrigation controller
How i built my own irrigation controllerHow i built my own irrigation controller
How i built my own irrigation controller
 
Android Service Patterns
Android Service PatternsAndroid Service Patterns
Android Service Patterns
 
Android, without batteries
Android, without batteriesAndroid, without batteries
Android, without batteries
 
Woning in spanje
Woning in spanjeWoning in spanje
Woning in spanje
 

Similar to Computing using GPUs

lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
ssuser413a98
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
fcassier
 
Introduction to GPUs for Machine Learning
Introduction to GPUs for Machine LearningIntroduction to GPUs for Machine Learning
Introduction to GPUs for Machine Learning
Sri Ambati
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010
John Holden
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to Accelerators
Dilum Bandara
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
Ofer Rosenberg
 
gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsn
ARUNACHALAM468781
 
Graphics processing uni computer archiecture
Graphics processing uni computer archiectureGraphics processing uni computer archiecture
Graphics processing uni computer archiecture
Haris456
 
Introduction to cuda geek camp singapore 2011
Introduction to cuda   geek camp singapore 2011Introduction to cuda   geek camp singapore 2011
Introduction to cuda geek camp singapore 2011
Raymond Tay
 
S0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cudaS0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cuda
mistercteam
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
Tigabu Yaya
 
Gpu perf-presentation
Gpu perf-presentationGpu perf-presentation
Gpu perf-presentation
GiannisTsagatakis
 
GPGPU Computation
GPGPU ComputationGPGPU Computation
GPGPU Computation
jtsagata
 
GPU for DL
GPU for DLGPU for DL
GPU for DL
Nikolay Karelin
 

Similar to Computing using GPUs (20)

lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
Introduction to GPUs for Machine Learning
Introduction to GPUs for Machine LearningIntroduction to GPUs for Machine Learning
Introduction to GPUs for Machine Learning
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to Accelerators
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
 
gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsn
 
Graphics processing uni computer archiecture
Graphics processing uni computer archiectureGraphics processing uni computer archiecture
Graphics processing uni computer archiecture
 
Introduction to cuda geek camp singapore 2011
Introduction to cuda   geek camp singapore 2011Introduction to cuda   geek camp singapore 2011
Introduction to cuda geek camp singapore 2011
 
S0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cudaS0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cuda
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
 
Gpu perf-presentation
Gpu perf-presentationGpu perf-presentation
Gpu perf-presentation
 
GPGPU Computation
GPGPU ComputationGPGPU Computation
GPGPU Computation
 
GPU for DL
GPU for DLGPU for DL
GPU for DL
 
GPU Programming
GPU ProgrammingGPU Programming
GPU Programming
 

Recently uploaded

20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 

Recently uploaded (20)

20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 

Computing using GPUs

  • 2. Speaker Intro • High Performance Computing @ Hewlett‐Packard – VizStack (http://vizstack.sourceforge.net) – GPU Computing • Big 3D enthusiast • Travels a lot • Blogs at http://www.shreekumar.in/
  • 3. What we will cover • GPUs and their history • Why use GPUs • Architecture • Getting Started with GPU Programming • Challenges, Techniques & Pitfalls • Where not to use GPUs ? • Resources • The Future
  • 4. What is a GPU • Graphics Programming Unit – Coined in 1999 by NVidia – Specialized add‐on board • Accelerates interactive 3D rendering – 60 image updates (or more) on large data – Solves embarrassingly parallel problem – Game driven volume economics • NVidia v/s ATI, just like Intel v/s AMD • Demand for better effects led to – programmable GPUs – floating point capabilities – this led to General Purpose GPU(GPGPU) Computation
  • 5. History of GPUs : a GPGPU Perspective Date Product Trans Cores Flops Technology 1997 RIVA 128 3 M Rasterization 1999 GeForce 256 25 M Transform & Lighting 2001 GeForce 3 60 M Programmable shaders 2002 GeForce FX 125 M 16, 32 bit FP, long shaders 2004 GeForce 6800 222 M Infinite length shaders, branching 2006 GeForce 8800 681 M 128 Unified graphics & compute, CUDA,  64 bit FP 2008 GeForce GTX  1.4 B 240 933 G IEEE FP, CUDA C, OpenCL and  280 78 M DirectCompute, PCI‐express Gen 2 2009 Tesla M2050 3.0 B 512 1.03 T Improved 64 bit perf, caching, ECC  515 G memory, 64‐bit unified addressing,  asynchronous bidirectional data  transfer, multiple kernels Source : Nickolls J. , Dally W.J. “The GPU Computing Era”, IEEE Micro, March-April 2010
  • 6. The GPU Advantage 30x CPU FLOPS on Latest GPUs 10x Memory Bandwidth Add to these a 3x Performance/$ Energy Efficient : 5x Performance/Watt All Graphs From: GPU4Vision : http://gpu4vision.icg.tugrz.at/
  • 7. People use GPUs for… Source : Nickolls J. , Dally W.J. “The GPU Computing Era”, IEEE Micro, March-April 2010
  • 8. More “why to use GPUs” • Proliferation of GPUs – Mobile devices will have capable GPUs soon ! • Make more things possible – Make things real‐time • From seconds to real‐time interactive performance – Reduce offline processing overhead • Research Opportunities – New & efficient algorithms – Pairing Multi‐core CPUs and massively multi‐threaded  GPUs
  • 12. CPU versus GPU • CPU – Optimized for latency – Speedup techniques • Vectorization (MMX, SSE, …) • Coarse Grained Parallelism using multiple CPUs and cores – Memory approaching a TB • GPU – Optimized for throughput – Speedup techniques • Massive multithreading • Fine grained parallelism – A few GBs of memory max
  • 13. Getting Started • Software – CUDA (NVidia specific) – OpenCL (Cross‐platform, GPU/CPU) – DirectCompute (MS specific) • Hardware – A system equipped with GPU • OS no bar – But Windows, RedHat Enterprise Linux seem better  supported
  • 14. CUDA • Compute Unified Device  Architecture • Most popular GPGPU toolkit • CUDA C extends C with  constructs – Easy to write programs • Lower level “driver” API is  available Source: NVIDIA CUDA Architecture, Introduction and Overview – Provides more control – Use multiple GPUs in the same  application – Mix graphics & compute code • Language bindings available – PyCUDA, Java, .NET • Toolkit provides conveniences CUDA Toolkit
  • 15. CUDA Architecture • 1 more streaming  multiprocessors (“cores”) • Thread Blocks – Single Instruction, Multiple  Thread (SIMT) – Hide latency by parallelism • Memory Hierarchy – Fermi GPUs can access  system memory • Primitives for – Thread synchronization – Atomic Operations on  memory Source : The GPU Computing Era
  • 16. Simple Example : Vector Addition C/C++ ‐ serial code void VecAdd(const float *A, const float*B, float *C, int N) { for(unsigned int i=0;i<N;i++) C[i]=A[i]+B[i]; } VecAdd(A,B,C,N); C/C++ with OpenMP – thread level parallelism void VecAdd(const float *A, const float*B, float *C, int N) { #pragma omp for for(unsigned int i=0;i<N;i++) C[i]=A[i]+B[i]; } VecAdd(A,B,C,N);
  • 17. Vector Addition using CUDA CUDA C – element level parallelism __global__ void VecAdd(const float *A, const float*B, float *C, int N) { int I = blockDim.x * blockIdx.x + threadIdx.x; if(i<N) C[i]=A[i]+B[i]; } Invoking the function cudaMalloc((void**)&d_A, size); Allocate Memory on GPU cudaMalloc((void**)&d_B, size); cudaMalloc((void**)&d_C, size); cudaMemcpy(d_A, A, size, cudaMemcpyHostToDevice); Copy Arrays to GPU cudaMemcpy(d_B, B, size, cudaMemcpyHostToDevice); int threadsPerBlock = 256; int blocksPerGrid = (N + threadsPerBlock - 1) / threadsPerBlock; Invoke function VecAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, N); cudaMemcpy(C, d_C, size, cudaMemcpyDeviceToHost); Copy Result Back to Main Memory cudaFree(d_A); cudaFree(d_B); Free GPU Memory cudaFree(d_C); Compilation # nvcc vectorAdd.cu –I ../../common/inc
  • 18. GPU Programming Challenges • Need high “occupancy” for best performance • Extracting parallelism with limited resources – Limited Registers – Limited Shared Memory • Preferred Approach – Small Kernels – Multiple Passes if needed • Decompose Problem into Parallel Pieces – Write once, scale perform everywhere!
  • 19. GPU Programming • Use Shared Memory when possible – Cooperation between threads in a block – Reduce access to global memory • Reduce Data Transfer over the Bus • It’s still a GPU ! – use textures to your advantage – use vector data types if you can • Watch out for GPU capability differences!
  • 20. Enough Theory! Demo Time & Let’s do some programming 
  • 21. Watch out for • Portability of programs across GPUs – Capabilities vary from GPU to GPU – Memory usage • Arithmetic differences in the result • Pay careful attention to demos…
  • 22. Resources • CUDA – Tools on NVIDIA Developer Site  http://developer.nvidia.com/object/gpucomputing.html – CUDPP  http://code.google.com/p/cudpp/ • OpenCL • Google Search !
  • 23. The Future • Better throughput – More GPU cores, scaling by Moore’s law – PCIe Gen 3 • Easier to program • Arbitrary control and data access patterns