GPU Computing: A brief overview


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

GPU Computing: A brief overview

  1. 1. GPU COMPUTING Presented By Rajiv Kumar V No -34 S7C
  2. 2. Graphics Processing Units(GPU):PowerfulProgrammable andHighly ParallelJen-sun Huang ,” GPU power is set to increase 570x whereas CPUpower would increase a mere 3x over the same timeframe of six years”
  3. 3. INTRODUCTION:• GPU has powered the display of Computers• Designed for real-time high resolution 3D graphics tasks• Commercial GPU-based systems are becoming common• NVIDIA and AMD expanding processor sophistication and software development tools• High accuracy by higher floating point precision• GPUs currently on a development cycle much closer to CPUs• GPU not constrained by sockets• Very small backwards compatibility needed in firmware while rest is delivered through driver implementation
  4. 4. GPU based S/W’s requirements• Computational requirements are large• Parallelism is substantial• Throughput is more important than latencyApp requirement to target GPGPUprogramming:• Large data sets• High parallelism• Minimal dependencies between data elements• High arithmetic intensity• Lots of work to do without CPU intervention
  5. 5. Task Vs. Data parallelismTask parallelism:• Independent processes with little communicationData parallelism:• Lots of data on which the same computation is being executed• No dependencies between data elements in each step in the computation• Can saturate many ALUs
  6. 6. GPU Vs CPU• CPU designed to process a task as fast as possible while GPU capable of processing a maximum of tasks on a large scale of data• CPU divides work in time while GPU divides work in space
  7. 7. Graphics Pipeline:• Input to the GPU is a list of geometric primitives• Vertex Operations: primitives transformed into screen space and shaded• Primitive Assembly: Vertices assembled into triangles• Computing their interaction with the lights in the scene• Rasterization: determines which screen-space pixels are covered by each triangle• Fragment Operations: Using color information each fragment is shaded to determine its final color• Each pixel’s color value may be computed from several fragments• Composition: Fragments are assembled into a final image with one color per pixel
  8. 8. Graphics Pipeline:
  9. 9. Evolution of GPU Architecture: • Fixed function pipeline lacked generality for complex effects • Replacement of fixed function per vertex and per fragment operations by vertex and fragment programs • Increased complexity of vertex and fragment program as Shader Model evolved • Support for Unified Shader Models Shader Models: • A Shader provides a user defined programmable alternative to hard-coded approach in GLSL • A Vertex Shader describe the traits(position, colors , depth value etc) of a vertex • A Geometry shader add volumetric detail & O/P is then sent to the rasterizer • A Pixel/fragment shader describe the traits (color, z- depth and alpha value) of a pixel
  10. 10. GPU Programming Model • Follows a SPMD programming model • Each element is independent from other elements in base programming Model • Many parallel elements processed by single program • Each element can operate on integer or float data with reasonably complete instruction set • Reads data from shared memory by scatter and gather operations • Code is in SIMD manner • Allows different execution path for each element • If elements branch in different directions both branches are computed • Computation as blocks in order of 16 elements • Finally programmers branches are permitted but not free
  11. 11. GPU Architecture:NVIDIANvidia 8800GTX architecture (top) A pair of SMs(right)
  12. 12. Memory Architecture• Capable of reading and writing anywhere in local memory(GPU) or elsewhere.• These non cached memories having large read/write latencies which can be masked by the extremely long pipeline, if they don’t wait for a reading instruction
  13. 13. GPGPU ProgrammingStream processing is a new paradigm to maximize theefficiency of parallel computing. It can be decomposed in twoparts:• Stream: It’s a collection of objects which can be operated in parallel and which require the same computation.• Kernel: It’s a function applied on the entire stream, looks like a “for each” loop
  14. 14. Terminology:Streams-Collection of records requiring similar computation eg. Vertex positions, Voxels etc.-Provide data parallelismKernels–Functions applied to each element in stream transforms–No dependencies between stream elements encourage high Arithmetic IntensityGather–Indirect read from memory ( x = a[i] )–Naturally maps to a texture fetch–Used to access data structures and data streamsScatter–Indirect write to memory ( a[i] = x )–Needed for building many data structures–Usually done on the CPU
  15. 15. What can you do on GPUs other thangraphics?• Large matrix/vector operations (BLAS)• Protein Folding (Molecular Dynamics)• FFT (SETI, signal processing)• Ray Tracing• Physics Simulation [cloth, fluid, collision]• Sequence Matching (Hidden Markov Models)• Speech Recognition (Hidden Markov Models, Neural nets)• Databases• Sort/Search• Medical Imaging (image segmentation, processing)And many, many more…
  16. 16. Future of GPU Computing:• Higher Bandwidth PCI-E bus path between CPU and GPU• AMD’s fusion and Intel’s IvyBridge places both CPU and GPU elements on a single chip• Addition of AVX instructions in CPU architectures• Programmable Pipelines over the current few programmable shading stages in the fixed graphics pipeline• Flexibility of variety of rendering along with general purpose processing
  17. 17. Looking Ahead:
  18. 18. Problems in GPGPU Computing• A killer App...???...??• Programming models and Tools…Proprietary nature…??• GPU in tomorrow’s Computer…Will it get dissolved…or absorbed???• Relationship to other parallel H/W and S/W• Managing Rapid Change…• Performance Evaluation and Cliffs• Broader Toolbox for computation and Data Structures…”Vertical” model for app development• Faults and Lack of Precision…
  19. 19. Drawbacks:• Power consumption• Increasing die size• Multi die solutions requiring inter-die connections increase the packaging and wafer cost• Increasing amount of die space to control logic , registers and cache as GPU becomes flexible and programmable• Comparing CPU to GPUs is more like comparing apples to oranges• Still lots of fixed functions hardware• Integration of multimedia fixed functions within the CPUs
  20. 20. References:• GPU Computing Gems Emerald Edition By Wen.Mei W. Hwu• Cuda By Example: An Introduction to General Purpose GPU Computing By J.Sanders,E.Kandrot (July 2010)•• ader_Programs.html• GPU Computing Proceedings of IEEE,May 2008• Evolution Of GPU By Chris Sietz
  21. 21. Thank You All… Any Questions… ???