An Implementation of a FIR Filter on a GPU Alexey Smirnov and Tzi-cker Chiueh ECSL Research Seminar 9/13/05
Outline <ul><li>Introduction </li></ul><ul><li>GPU Computing Overview </li></ul><ul><li>Related Work </li></ul><ul><li>FIR...
Introduction <ul><li>Numerical algorithms often perform repeated computations on vectors of elements. </li></ul><ul><li>Pa...
Computation and Bandwidth Rates <ul><li>Video cards have higher GFLOPs rate and memory bandwidth compared to CPU. </li></u...
GPU Computing Background <ul><li>Rendering pipeline: </li></ul><ul><ul><li>User program defines vertex and texture coordin...
Rendering APIs <ul><li>OpenGL (Linux, Windows, MacOS) and DirectX (Windows). </li></ul><ul><li>OpenGL extensions allow to ...
GPU Program Architecture <ul><li>Create floating-point textures that contain input data and load them into video memory; <...
Input Data Representation <ul><li>Matrices are represented as textures naturally. Four elements per pixel (R, G, B, A). </...
Related Work <ul><li>Four papers describing matrix multiplication; </li></ul><ul><li>Linear algebra operations; </li></ul>...
FIR Filter Definition <ul><li>Finite Impulse Response (FIR) filter is used in audio processing. </li></ul><ul><li>We modif...
Other Relevant Transformations <ul><li>Hilbert transformation: </li></ul><ul><li>Frequency translation FIR filter: </li></ul>
FIR Filter on a GPU
FIR Filter’s Loop <ul><li>Initialization: </li></ul><ul><li>Loop iteration: </li></ul>
FIR Filter’s Loop <ul><li>O(j+1)=O(j)+MI </li></ul><ul><li>Final output value is computed as </li></ul>
Fragment Program
Optimizations <ul><li>Break loop into two to get rid of conditional expression; </li></ul><ul><li>Unroll loop body w/ and ...
Performance Evaluation: FIR Filter
Performance of FreqXlating FIR Filter
Performance of Hilbert Transformation
Conclusion <ul><li>Not everything improves from GPU optimization. </li></ul><ul><li>CPU optimization tricks do not work on...
Future Work <ul><li>QoS for GPU: can application specify maximum latency or share of GPU resources? </li></ul><ul><li>Work...
Upcoming SlideShare
Loading in …5
×

FIR filter on GPU

4,333 views

Published on

Published in: Technology, Art & Photos
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,333
On SlideShare
0
From Embeds
0
Number of Embeds
805
Actions
Shares
0
Downloads
46
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

FIR filter on GPU

  1. 1. An Implementation of a FIR Filter on a GPU Alexey Smirnov and Tzi-cker Chiueh ECSL Research Seminar 9/13/05
  2. 2. Outline <ul><li>Introduction </li></ul><ul><li>GPU Computing Overview </li></ul><ul><li>Related Work </li></ul><ul><li>FIR Filter Definition </li></ul><ul><li>FIR Filter Implementation on GPU </li></ul><ul><li>Performance Evaluation </li></ul><ul><li>Conclusion </li></ul>
  3. 3. Introduction <ul><li>Numerical algorithms often perform repeated computations on vectors of elements. </li></ul><ul><li>Parallel computation improves performance. </li></ul><ul><li>x86: MMX, SSE, SSE2, SSE3. </li></ul><ul><li>Video cards are now programmable. </li></ul>
  4. 4. Computation and Bandwidth Rates <ul><li>Video cards have higher GFLOPs rate and memory bandwidth compared to CPU. </li></ul><ul><li>However, data copying between main memory and video memory can reduce performance. </li></ul>
  5. 5. GPU Computing Background <ul><li>Rendering pipeline: </li></ul><ul><ul><li>User program defines vertex and texture coordinates. </li></ul></ul><ul><ul><li>Vertex processor converts vertex attributes from world coordinate system into screen coordinate system. </li></ul></ul><ul><ul><li>Fragment processor computes color of each output pixel using textures and color. </li></ul></ul><ul><ul><li>Interpolation defines coordinates and color for each pixel. </li></ul></ul><ul><li>Vertex and fragment processors are programmable for example in C-like language Cg. </li></ul>
  6. 6. Rendering APIs <ul><li>OpenGL (Linux, Windows, MacOS) and DirectX (Windows). </li></ul><ul><li>OpenGL extensions allow to use advanced features of a video card. </li></ul><ul><li>NV_float_buffer supports floating-point textures. </li></ul><ul><li>ARB_render_texture allows to render to a texture instead of the screen. </li></ul>
  7. 7. GPU Program Architecture <ul><li>Create floating-point textures that contain input data and load them into video memory; </li></ul><ul><li>Load the fragment program and enable multi-texturing; </li></ul><ul><li>Define vertex and texture coordinates; </li></ul><ul><li>Draw the figure to an off-screen buffer; </li></ul><ul><li>If the results were rendered to an off-screen buffer then copy the image to a texture using glCopyTexSubImage2D(). </li></ul><ul><li>Go to step 3 if more iterations needed. </li></ul><ul><li>Use glGetTexImage() to copy data from video memory to main memory. </li></ul>
  8. 8. Input Data Representation <ul><li>Matrices are represented as textures naturally. Four elements per pixel (R, G, B, A). </li></ul><ul><li>Vectors are wrapped into matrices. Textures have maximum dimensions. </li></ul>
  9. 9. Related Work <ul><li>Four papers describing matrix multiplication; </li></ul><ul><li>Linear algebra operations; </li></ul><ul><li>Array sorting; </li></ul><ul><li>FFT; </li></ul><ul><li>Earlier papers concluded that the CPU is more efficient then GPU. </li></ul><ul><li>Recent video cards, e.g. GeForce 7800 and ATI X800 XT do better than CPU. </li></ul>
  10. 10. FIR Filter Definition <ul><li>Finite Impulse Response (FIR) filter is used in audio processing. </li></ul><ul><li>We modified GNU Radio – an open-source software implementing Software Defined Radio. </li></ul>
  11. 11. Other Relevant Transformations <ul><li>Hilbert transformation: </li></ul><ul><li>Frequency translation FIR filter: </li></ul>
  12. 12. FIR Filter on a GPU
  13. 13. FIR Filter’s Loop <ul><li>Initialization: </li></ul><ul><li>Loop iteration: </li></ul>
  14. 14. FIR Filter’s Loop <ul><li>O(j+1)=O(j)+MI </li></ul><ul><li>Final output value is computed as </li></ul>
  15. 15. Fragment Program
  16. 16. Optimizations <ul><li>Break loop into two to get rid of conditional expression; </li></ul><ul><li>Unroll loop body w/ and w/o conditional expression; </li></ul><ul><li>Process two rows of input and textures; </li></ul><ul><li>Use different texture units in unrolled loops; </li></ul><ul><li>Nothing of the above improved performance. </li></ul>
  17. 17. Performance Evaluation: FIR Filter
  18. 18. Performance of FreqXlating FIR Filter
  19. 19. Performance of Hilbert Transformation
  20. 20. Conclusion <ul><li>Not everything improves from GPU optimization. </li></ul><ul><li>CPU optimization tricks do not work on GPU. </li></ul><ul><li>Texture upload/download takes up to 60% of total time. </li></ul><ul><li>GPU computation can take several seconds compared to millisecond time to render a frame in a game. </li></ul>
  21. 21. Future Work <ul><li>QoS for GPU: can application specify maximum latency or share of GPU resources? </li></ul><ul><li>Work offload from CPU to GPU: is it possible to build a compiler that can automatically decide what is worth GPU optimization? </li></ul><ul><li>Debugging support: a lot of tools for Windows, none for Linux. </li></ul>

×