Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Seminar „11                             CUDAContents              1   WHAT IS CUDA ??????              2   EXECUTION MODEL...
Seminar „11                                              CUDAWhat is CUDA ?????? CUDA – Compute Unified Device  Architect...
Seminar „11                                     CUDAWhat is CUDA ?????? Using CUDA – No need to map GPU towards Graphics ...
Seminar „11                                        CUDA  CUDA kernels & threads Device = GPU     Executes parallel porti...
Seminar „11                                         CUDAArrays parallel threads A CUDA kernel is executed by an array of ...
Seminar „11                                          CUDAThread batching Thread cooperation is valuable     Share result...
Seminar „11                                               CUDAThread Batching (Contd…)               (x+yDx+zDxDy) (for 3...
Seminar „11                                       CUDAThread Batching (Contd…)  There is block ID      • Calculated as th...
Seminar „11                                              CUDATransparent Scalability Hardware is free to schedule thread ...
Seminar „11                                                   CUDACUDA architectures    Architecture’s Codename        G80...
Seminar „11                  CUDA8 & 10 Series Architecture                                    G80                        ...
Seminar „11              CUDAKernel memory access Per thread                Thread Per block              Block Per dev...
Seminar „11                                           CUDAPhysical Memory Layout “Local” memory resides in device DRAM   ...
Seminar „11                    CUDAExecution Model                   Threads are executed                    by thread pr...
Seminar „11                 CUDACUDA software development3/17/2012                          16
Seminar „11                       CUDACompiling CUDA code CUDA nvcc compiler to    compile the .cu files which    divides...
Seminar „11                                                 CUDA    Exampleint main(void){         float *a_h, *b_h;      ...
Seminar „11                                  CUDA ApplicationsFinance                Numeric         Medical              ...
Seminar „11                                    CUDAAdvantages Provides shared memory Cost effective The gaming industri...
Seminar „11                                  CUDADrawbacks Despite having hundreds of “cores” CUDA is not as    flexible ...
Seminar „11                                  CUDAFuture Scope Implementation of CUDA in several other group of    compani...
Seminar „11                                        CUDAConclusion Brought significant innovations to the High Performance...
Seminar „11                                                       CUDA  References1. “CUDA by Example: An Introduction to ...
Seminar „11                    CUDA              Questions?????3/17/2012                             26
Cuda
Cuda
Upcoming SlideShare
Loading in …5
×

Cuda

New Seminar topic

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

Cuda

  1. 1. Seminar „11 CUDAContents 1 WHAT IS CUDA ?????? 2 EXECUTION MODEL 3 IMPLEMENTATION 4 APPLICATION3/17/2012 2
  2. 2. Seminar „11 CUDAWhat is CUDA ?????? CUDA – Compute Unified Device Architecture  Hardware and software architecture  For computing on the GPU  Developed by Nvidia in 2007  GPU  Do massive amount of task simultaneously and quickly by using several ALUs  ALUs are programmable by Graphics API3/17/2012 3
  3. 3. Seminar „11 CUDAWhat is CUDA ?????? Using CUDA – No need to map GPU towards Graphics APIs CUDA provides number crunching very fast CUDA is well suited for highly parallel algorithms and large datasets Consists of heterogeneous programming model and software environment  Hardware and software models  An Extension of C programming Designed to enable heterogeneous computation  Computation with CPU &GPU3/17/2012 4
  4. 4. Seminar „11 CUDA CUDA kernels & threads Device = GPU  Executes parallel portions of an application as kernels Host = CPU  Executes serial portions of an application Kernel = Functions that runs on device  One kernel at one time  Many threads execute each kernel Posses host and device memory Host and device connected by PCI EXPRESS X16 3/17/2012 5
  5. 5. Seminar „11 CUDAArrays parallel threads A CUDA kernel is executed by an array of threads  All threads run the same code  Each thread has ID uses to compute memory addresses3/17/2012 6
  6. 6. Seminar „11 CUDAThread batching Thread cooperation is valuable  Share results to avoid redundant computation  Share memory accesses Thread block = Group of threads  Threads cooperate together using shared memory and synchronization  Thread ID is calculated by  x+yDx (for 2 dimensional block) (x,y) – thread index (Dx,Dy) – block size3/17/2012 7
  7. 7. Seminar „11 CUDAThread Batching (Contd…)  (x+yDx+zDxDy) (for 3 dimensional block) (x,y,z) – thread index (Dx,Dy,Dz) – block size Grid = Group of thread blocks3/17/2012 8
  8. 8. Seminar „11 CUDAThread Batching (Contd…)  There is block ID • Calculated as thread ID  Threads in different blocks cannot cooperate3/17/2012 9
  9. 9. Seminar „11 CUDATransparent Scalability Hardware is free to schedule thread blocks on any processor  A kernel scales across parallel multiprocessors3/17/2012 10
  10. 10. Seminar „11 CUDACUDA architectures Architecture’s Codename G80 GT200 Fermi Release Year 2006 2008 2010 Number of Transistors 681 million 1.4 billion 3.0 billion Streaming Multiprocessors 16 30 16 (SM) Streaming Processors (per 8 8 32 SM) Streaming Processors (total) 128 240 512 Configurable 48 Shared Memory (per SM) 16 KB 16 KB KB or 16 KB Configurable 16 L1 Cache (per SM) None None KB or 48 KB3/17/2012 11
  11. 11. Seminar „11 CUDA8 & 10 Series Architecture G80 GT2003/17/2012 12
  12. 12. Seminar „11 CUDAKernel memory access Per thread Thread Per block Block Per device3/17/2012 13
  13. 13. Seminar „11 CUDAPhysical Memory Layout “Local” memory resides in device DRAM  Use registers and shared memory to minimize local memory use Host can read and write global memory but not shared memory3/17/2012 14
  14. 14. Seminar „11 CUDAExecution Model  Threads are executed by thread processors  Thread blocks are executed by multiprocessors  A kernel is launched as a grid of thread blocks3/17/2012 15
  15. 15. Seminar „11 CUDACUDA software development3/17/2012 16
  16. 16. Seminar „11 CUDACompiling CUDA code CUDA nvcc compiler to compile the .cu files which divides code into NVidia assembly and C++ code.3/17/2012 17
  17. 17. Seminar „11 CUDA Exampleint main(void){ float *a_h, *b_h; //host data float *a_d, *b_d; //device data Host Device int N = 15, nBytes, i; nBytes = N*sizeof(float); a_h a_d a_h = (float*)malloc(nBytes); b_h = (float*)malloc(nBytes); b_h b_d cudaMalloc((void**)&a_d,nBytes); cudaMalloc((void**)&b_d,nBytes); for(i=0; i<N; i++) a_h[i]=100.f +i; cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice); cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice); cudaMemcpy(b_h, b_d, nByyes, cudaMemcpyDeviceToHost); for(i=0; i<N; i++) assert(a_h[i] == b_h[i]); free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d); return 0;} 3/17/2012 18
  18. 18. Seminar „11 CUDA ApplicationsFinance Numeric Medical Oil & Gas BiophysicsAudio Video Imaging 3/17/2012 19
  19. 19. Seminar „11 CUDAAdvantages Provides shared memory Cost effective The gaming industries demand on Graphics cards has forced a lot of research and money into the improvement of the GPUs Transparent Scalability3/17/2012 20
  20. 20. Seminar „11 CUDADrawbacks Despite having hundreds of “cores” CUDA is not as flexible as CPU‟s Not as effective for personal computers3/17/2012 21
  21. 21. Seminar „11 CUDAFuture Scope Implementation of CUDA in several other group of companies‟ GPUs. More and more streaming processors can be included CUDA in wide variety of programming languages.3/17/2012 22
  22. 22. Seminar „11 CUDAConclusion Brought significant innovations to the High Performance Computing world. CUDA simplified process of development of general purpose parallel applications. These applications have now enough computational power to get proper results in a short time.3/17/2012 23
  23. 23. Seminar „11 CUDA References1. “CUDA by Example: An Introduction to General-Purpose GPU Programming” by Edward kandrot2. “Programming Massively Parallel Processors: A Hands-on Approach (Applications of GPU Computing Series)” By David B kirk & Wen Mei W. Hwu.3. “GPU Computing Gems Emerald Edition (Applications of GPU Computing Series)” By Wen-mei W. Hwu .4. “The Cost To Play: CUDA Programming” , By Douglas Eadline, Ph.D. ,on Linux Magazine Wednesday, February 17th, 20105. “Nvidia Announces CUDA x86” Written by Cristian, On Tech Connect Magazine 21 September 20106. CUDA Programming Guide. ver. 1.1, http://www.nvidia.com/object/cuda_develop.html7. TESLA GPU Computing Technical Brief, http://www.nvidia.com/object/tesla_product_literature.html8. G80 architecture reviews and specification, http://www.nvidia.com/page/8800_reviews.html, http://www.nvidia.com/page/8800_tech_specs.html9. Beyond3D G80: Architecture and GPU Analysis, http://www.beyond3d.com/content/reviews/110. Graphics adapters supporting CUDA, http://www.nvidia.com/object/cuda_learn_products.html 3/17/2012 24
  24. 24. Seminar „11 CUDA Questions?????3/17/2012 26

×