GPU - An Introduction


Published on

GPU ,GPU Architecture, CUDA, TLP

Published in: Technology

GPU - An Introduction

  1. 1. Graphics Processing Unit DHAN V SAGAR CB.EN.P2CSE13007
  2. 2. Introduction It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multi threaded multiprocessor optimized for visual computing. It provide real-time visual interaction with computed objects via graphics images, and video.
  3. 3. History ● Up to late 90's – No GPUs – Much simpler VGA controller ● Consisted of – A memory controller – Display generator + DRAM ● DRAM was either shared with CPU or private
  4. 4. History ● By 1997 – More complex VGA controllers ● Incorporated 3D accelerating functions in hardware – Triangle set up and rasterization – Texture mapping and shading A combination of shapes(Lines, polygons, letters, …) into an image consisting of individual pixels
  5. 5. History ● By 2000 – Single chip graphics processor incorporated nearly all functions of graphics pipeline of high-end workstations ● Beginning of the end of high-end workstation market – VGA controller was renamed Graphic Processing Units
  6. 6. Current Trends Well defined APIs Open GL: Open standard for 3D graphics programming Web GL: Open GL extension for web DirectX: Set of MS multimedia programming interfaces (Direct3D for 3D graphics) Can implement novel graphics algorithms Use GPUs for non-conventional applications
  7. 7. Current Trends Combining powers of CPU and GPU - heterogeneous architectures GPUs become scalable parallel processors Moving from hardware-defined pipelining architectures to more flexible programmable architectures
  8. 8. Architechture Evolution Memory CPU floating point co-processors attached to microprocessors. Graphic s card Interest to provide hardware support for displays Display Led to graphics processing units (GPUs)
  9. 9. GPUs with dedicated pipelines Input stage Vertex shader stage Graphi cs memor y Geometry shader stage Frame buffer Rasterizer stage Pixel shading stage Graphics chips generally had a pipeline structure individual stages performing Specialized operations, finally leading to loading frame buffer for Display Individual stages may have access to graphics memory for storing intermediate computed data.
  10. 10. PROGRAMMING GPUS • • • Will focus on parallel computing applications Must decompose problem into set of parallel computations Ideally two-level to match GPU organization
  11. 11. Example Small Small array array Data are in big array Small array Small Small array array Tiny Tiny Tiny Tiny
  12. 12. GPGU and CUDA GPGU ● General-Purpose computing on GPU ● Uses traditional graphics API and graphics pipeline CUDA ● Compute Unified Device Architecture ● Parallel computing platform and programming model ● Invented by NVIDIA ● Single Program Multiple Data approach
  13. 13. CUDA ➢ ➢ ➢ CUDA programs are written in C Within C programs, call SIMT “kernel” routines that are executed on GPU Provides three abstractions ➢ ➢ ➢ Hierarchy of thread groups Shared memory Barrier synchronization
  14. 14. Cont..
  15. 15. CUDA ● ● ● Lowest level of parallelism – CUDA Thread Compiler + Hardware can gang 1000s of CUDA threads together leads to various levels of parallelism within the GPU MIMD,SIMD,Instruction level Parallelism Single Instruction, Multiple Thread (SIMT)
  16. 16. Conventional C Code // Invoke DAXPY dapxy(n,2.0,x,y); // DAXPY in C void daxpy(int n,double a,double *x, double *y) { for (int i=0;i<n;++i) y[i] = a*x[i] + y[i]; }
  17. 17. Corresponding CUDA Code // Invoke DAXPY with 256 threads per Thread Block _host_ int nblocks = (n+255)/256; daxpy<<<nblocks,256>>>(n,2.0,x,y); //DAXPY in CUDA _device_ Void daxpy(int n,double a,double *x, double *y) { int i = blockIdX.x*blockDim.x+threadIdx.x; if(i<n) y[i]=a*x[i]+y[i]; } ●
  18. 18. Cont... ● _device_ (OR) _global_ ● _host_ ● ● --- functions of GPU --- functions of the system processor CUDA variables declared in the _device_ are allocated to the GPU Memory,which is acessable by all the multithreaded SIMD processors Function call syntax for the function uses GPU is name<<<dimGrid,dimBlock>>>(..parameterlist..) ● GPU Hardware handles Threads
  19. 19. ● ● Threads are blocked together and executed in group of 32 threads – Thread Block The hardware that executes a whole block of threats is called a Multithreaded SIMD Processor
  20. 20. Reference “Larrabee: A Many-Core x86 Architecture for Visual Computing”, Kruger and Westermann, International Conf. on Computer Graphics and Interactive Techniques, 2005 “ An Analytical Model for a GPU Architecture with Memory-level and Thread-level Parallelism Awareness”Sunpyo Hong,Hyesoon Kim
  21. 21. Thank You..