Stream Processing

734 views

Published on

Presented in CMPUT 429, Fall '11

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
734
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Stream Processing

  1. 1. Stream processing is a computerprogramming paradigm, related toSIMD
  2. 2. Stream processing is a computerprogramming paradigm, related toSIMDIt allows some applications to moreeasily exploit a limited form ofparallel processing
  3. 3. A stream is simply a set of recordsthat require similar computation.Streams provide data parallelism
  4. 4. A stream is simply a set of recordsthat require similar computation.Streams provide data parallelism Kernels are the functions that are applied to each element in the stream
  5. 5. A stream is simply a set of recordsthat require similar computation.Streams provide data parallelism Kernels are the functions that are applied to each element in the streamFor each element we can only read from theinput, perform operations on it, and write to theoutput
  6. 6. Stream processing is especially suitable forapplications that exhibit three characteristics ---
  7. 7. Stream processing is especially suitable forapplications that exhibit three characteristics ---
  8. 8. Stream processing is especially suitable forapplications that exhibit three characteristics ---
  9. 9. Stream processing is especially suitable forapplications that exhibit three characteristics ---
  10. 10. Flynn’s Taxonomy: SISDSingle Instruction: Only one instruction stream is being acted onby the CPU during any one clock cycleSingle Data: Only one data stream is being used as input duringany one clock cycle
  11. 11. Flynn’s Taxonomy: SIMDSingle Instruction: All processing units execute the sameinstruction at any given clock cycleMultiple Data: Each processing unit can operate on a differentdata element
  12. 12. Flynn’s Taxonomy: MISDMultiple Instruction: Each processing unit operates on the dataindependently via separate instruction streams.Single Data: A single data stream is fed into multiple processingunits.
  13. 13. Flynn’s Taxonomy: MIMDMultiple Instruction: Every processor may be executing a differentinstruction streamMultiple Data: Every processor may be working with a different datastream
  14. 14. Stream Processorsstream processing makes use of locality of reference by explicitlygrouping related code and data together for easy fetching into thecache
  15. 15. A stream processing language for programs basedon streams of data e.g Audio, video, DSP, networking, and cryptographic processing kernels HDTV editing, radar tracking, microphone arrays, cellphone base stations, graphics [Thies 2002]
  16. 16. A high-level, architecture-independent languagefor streaming applications1. Improves programmer productivity (vs. Java, C)2. Offers scalable performance on multicores [Thies 2002]
  17. 17. GPUGPU is a single-chip processorthat creates lighting effects andtransforms objects every time a 3Dscene is redrawnUsed primarily for 3-Dapplications. a GPU can be present on a video card, or it can be on the motherboard, or in certain CPUs, on the CPU die
  18. 18. World’s First GPuNvidia in 1999 marketed the GeForce 256 as "the worldsfirst GPU, a single-chip processor that is capable ofprocessing a minimum of 10 million polygons per second".Rival ATI Technologies coined the term visual processingunit or VPU with the release of the Radeon 9700 in 2002.
  19. 19. GPUs have a very high compute capacity
  20. 20. GPUs have a very high compute capacity
  21. 21. GPUs have a very high compute capacityTo the hardware, the acceleratorlooks like another IO unit; itcommunicates with the CPU using IOcommands and DMA memory transfers
  22. 22. GPUs have a very high compute capacityTo the hardware, the acceleratorlooks like another IO unit; it To the software, the acceleratorcommunicates with the CPU using IO is another computer to which yourcommands and DMA memory transfers program sends data and routines to execute
  23. 23. GPGPUThis concept turns the massive floating-point computationalpower of a modern graphics accelerator into general-purposecomputing power
  24. 24. GPGPUThis concept turns the massive floating-point computationalpower of a modern graphics accelerator into general-purposecomputing power
  25. 25. GPGPUThis concept turns the massive floating-point computationalpower of a modern graphics accelerator into general-purposecomputing power
  26. 26. GPGPUThis concept turns the massive floating-point computationalpower of a modern graphics accelerator into general-purposecomputing power GPUs are stream processors – processors that can operate in parallel by running a single kernel on many records in a stream at once
  27. 27. GPGPU This concept turns the massive floating-point computational power of a modern graphics accelerator into general-purpose computing power GPUs are stream processors – processors that can operate in parallel by running a single kernel on many records in a stream at onceIdeal GPGPU applications have large data sets, high parallelism,and minimal dependency between data elements
  28. 28. In certain circumstances the GPU calculates fortytimes faster than the conventional CPUs
  29. 29. In certain circumstances the GPU calculates forty times faster than the conventional CPUsAMDAthlon 64 CPU 154 mX2
  30. 30. In certain circumstances the GPU calculates forty times faster than the conventional CPUsAMD ATI X1950Athlon 64 CPU 154 m GPU 384 m XTXX2
  31. 31. In certain circumstances the GPU calculates forty times faster than the conventional CPUsAMD ATI X1950Athlon 64 CPU 154 m GPU 384 m XTXX2Intel Core 2 CPU 582 mQuad
  32. 32. In certain circumstances the GPU calculates forty times faster than the conventional CPUsAMD ATI X1950Athlon 64 CPU 154 m GPU 384 m XTXX2Intel Core 2 NVIDIA CPU 582 m GPU 680 mQuad G8800 GTX
  33. 33. “The processing power of just 5,000 ATI processors isalso enough to rival that of the existing 200,000computers currently involved in the Folding@home project” [Ref 1]
  34. 34. “The processing power of just 5,000 ATI processors is also enough to rival that of the existing 200,000 computers currently involved in the Folding@home project”“..it is estimated that if a mere 10,000 computers were toeach use an ATI processor to conduct folding research, thatthe Folding@home program would effectively perform fasterthan the fastest supercomputer in existencetoday, surpassing the 1 petaFLOP level “- 2007 November 10, 2011- Folding@home 6.0 petaFlop where 8.162 petaFLOP ( K computer) [Ref 1]
  35. 35. comparing GPUs to CPUs isnt an apples-to-apples comparison The clock rates are lowerthe architectures are radicallydifferent the problems theyre trying to solve are almost completely unrelated
  36. 36. Application Processor:Executes application code likeMPEG decodingSequences the instructions andissues them to Stream clientse.g KEU andDRAM interface [Kapasi 2003]
  37. 37. Two Stream Clients:KEU:Programmable Kernel ExecutionUnitDRAM interface:Provides access to global datastorage [Kapasi 2003]
  38. 38. KEU:It has two stream levelinstructions:1. load_kernel – loads compiled kernel function in the local instruction storage inside the KEU2. run_kernel – executes the kernel [Kapasi 2003]
  39. 39. DRAM interface:Two stream level instructionsas well –1. load_stream – loads an entire stream from SRF2. store_stream – stores a stream into SRF [Kapasi 2003]
  40. 40. Local register files (LRFs)1. use for operands for arithmetic operations (similar to caches on CPUs)2. exploit fine-grain locality [Kapasi 2003]
  41. 41. Stream register files(SRFs)1. capture coarse-grain locality2. efficiently transfer data to and from the LRFs [Kapasi 2003]
  42. 42. [Kapasi 2003]
  43. 43. Topics learnt today:1. Stream Processing 3. How modern GPUs use stream processing 4. Imagine Stream Processor from Stanford2. StreamIT language from MIT

×