GPU Metaprogramming using PyCUDA: Methods & Applications
by NVIDIA on Aug 19, 2010
- 1,891 views
"Writing reliable GPU codes that achieve peak performance in the face of changing requirements and hardware platforms can be a challenging task. In this talk, I will introduce the open-source PyCUDA ...
"Writing reliable GPU codes that achieve peak performance in the face of changing requirements and hardware platforms can be a challenging task. In this talk, I will introduce the open-source PyCUDA toolkit, which assists in this task in a number of ways: Convenient, high-level interface PyCUDA binds all functionality in Nvidia CUDA to a convenient interface in the high-level scripting language Python. Resource management and error checking are automatic. Code Templates PyCUDA comes with tuned and debugged code for many common operations, such as vector math and reductions, which saves debugging and coding time. Metaprogramming PyCUDA allows GPU code to be generated at run-time code, which makes many advanced programming techniques easy–such as empirical optimization, constant folding, and run-time specialization. Scalability PyCUDA covers ”small-scale” and ”large-scale” uses alike: It allows quick prototyping and experimentation,but it also integrates easily into large-scale computational software. Having introduced the toolkit, I will show how PyCUDA has supported a number of applications in computational science:
First, we have successfully used PyCUDA in a high-performance discontinuous Galerkin finite element (DG-FEM) solver. The term DG-FEM describes a family of high-order accurate numerical methods for systems of partial differential equations that model real-world processes such as electromagnetic scattering or fluid flow. We found that these methods’ algorithmic structure makes them very suitable for execution on a GPU, often achieving speedup factors on the order of 50 when compared to a single CPU core. PyCUDA and GPU metaprogrammingwere crucial in achieving this level of performance. As an added benefit, the resulting solver turned out to be very versatile with respect to equation types, domain dimensionality, and discretization parameters. Second, I will discuss a recent effort seeking to automate the writing of high-performance GPU code for a large class of computational kernels that includes many of those needed for the numerical discretization of PDEs. Again, code generation and empirical optimization as provided by PyCUDA provide the basis for the approach that may make GPU performance possible even in situations where manual development is not economical."
© All Rights Reserved
- Embed Views
- Views on SlideShare
- Total Views