Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Python and GPU
Computing
Glib Ivashkevych
HPC software developer, GERO Lab
Parallel revolution
The Free Lunch Is Over: A Fundamental Turn Toward
Concurrency in Software
Herb Sutter, March 2005
When...
July 2006
Feb 2007
Nov 2008
Intel launches Core 2 Duo (Conroe)
Nvidia releases CUDA SDK
Tsubame, first GPU accelerated
sup...
It's very clear, that we are close to the tipping point. If we're not at a
tipping point, we're racing at it.
Jen-Hsun Hua...
Heterogeneous
computing
CPU
main memory
GPU
cores
GPU
memory
multiprocessors
Host Device
CPU GPU
general purpose
sophisticated design
and scheduling
perfect for
task parallelism
highly parallel
huge memory bandw...
Anatomy of GPU:
multiprocessors
GPU
MP
shared
memory
GPU is composed of
tens of
multiprocessors
(streaming processors), wh...
Compute
Unified
Device
Architecture
is a
hierarchy of
computation
memory
synchronization
Compute hierarchy
software
kernel
hardware
abstractions
hardware
thread
thread block
grid of blocks
core
multiprocessor
GPU
Compute hierarchy
thread
threadIdx
thread block
blockIdx, blockDim
grid of blocks
gridDim
Python
fast development
huge # of packages: for data analysis, linear
algebra, special functions etc
metaprogramming
Conve...
PyCUDA
Wrapper package around CUDA API
Convenient abstractions: GPUArray, random numbers
generation, reductions & scans et...
GPUArray
NumPy-like interface for GPU arrays
Convenient creation and manipulation routines
Elementwise operations
Cleanup
SourceModule
Abstraction to create, compile and run GPU
code
GPU code to compile is passed as a string
Control over nvcc c...
Metaprogramming
GPU code can be created at runtime
PyCUDA uses mako template engine internally
Any template engine is ok t...
Installation
numpy, mako, CUDA driver & toolkit are
required
Boost.Python is optional
Dev packages: if you build from sour...
NumbaPro
Accelerator package for Python
Generates machine code from Python scalar functions
(create ufunc)
from numbapro i...
GPU computing resources
Documentation
Intro to Parallel Programming
by David Luebke (Nvidia) and John Owens (UC Davis)
Het...
Upcoming SlideShare
Loading in …5
×

Python и программирование GPU (Ивашкевич Глеб)

1,032 views

Published on

Ивашкевич Глеб - HPC software developer / Gero / Украина, Харьков

Графические процессоры становятся частью стандартного инструментария в высокопроизводительных вычислениях. Одновременно появляются новые и совершенствуются уже существующие программные средства. Мы поговорим об архитектуре графических процессоров Nvidia и о том, как с ними работать из Python.

http://www.it-sobytie.ru/events/2040

Published in: Education, Technology
  • Be the first to comment

Python и программирование GPU (Ивашкевич Глеб)

  1. 1. Python and GPU Computing Glib Ivashkevych HPC software developer, GERO Lab
  2. 2. Parallel revolution The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software Herb Sutter, March 2005 When serial code hits the wall. Power wall. Now, Intel is embarked on a course already adopted by some of its major rivals: obtaining more computing power by stamping multiple processors on a single chip rather than straining to increase the speed of a single processor. Paul S. Otellini, Intel's CEO May 2004
  3. 3. July 2006 Feb 2007 Nov 2008 Intel launches Core 2 Duo (Conroe) Nvidia releases CUDA SDK Tsubame, first GPU accelerated supercomputer Dec 2008 OpenCL 1.0 specification released Today >50 GPU powered supercomputers in Top500, 9 in Top50
  4. 4. It's very clear, that we are close to the tipping point. If we're not at a tipping point, we're racing at it. Jen-Hsun Huang, NVIDIA Co-founder and CEO March 2013 Heterogeneous computing becomes a standard in HPC and programming has changed
  5. 5. Heterogeneous computing CPU main memory GPU cores GPU memory multiprocessors Host Device
  6. 6. CPU GPU general purpose sophisticated design and scheduling perfect for task parallelism highly parallel huge memory bandwidth lightweight scheduling perfect for data parallelism
  7. 7. Anatomy of GPU: multiprocessors GPU MP shared memory GPU is composed of tens of multiprocessors (streaming processors), which are composed of tens of cores = hundreds of cores
  8. 8. Compute Unified Device Architecture is a hierarchy of computation memory synchronization
  9. 9. Compute hierarchy software kernel hardware abstractions hardware thread thread block grid of blocks core multiprocessor GPU
  10. 10. Compute hierarchy thread threadIdx thread block blockIdx, blockDim grid of blocks gridDim
  11. 11. Python fast development huge # of packages: for data analysis, linear algebra, special functions etc metaprogramming Convenient, but not that fast in number crunching
  12. 12. PyCUDA Wrapper package around CUDA API Convenient abstractions: GPUArray, random numbers generation, reductions & scans etc Automatic cleanup, initialization and error checking, kernels caching Completeness
  13. 13. GPUArray NumPy-like interface for GPU arrays Convenient creation and manipulation routines Elementwise operations Cleanup
  14. 14. SourceModule Abstraction to create, compile and run GPU code GPU code to compile is passed as a string Control over nvcc compiler options Convenient interface to get kernels
  15. 15. Metaprogramming GPU code can be created at runtime PyCUDA uses mako template engine internally Any template engine is ok to create GPU source code. Remember about codepy Create more flexible and optimized code
  16. 16. Installation numpy, mako, CUDA driver & toolkit are required Boost.Python is optional Dev packages: if you build from source Also: PyOpenCl, pyfft
  17. 17. NumbaPro Accelerator package for Python Generates machine code from Python scalar functions (create ufunc) from numbapro import vectorize import numpy as np @vectorize(['float32(float32, float32)'], target='cpu') def add2(a, b): return a + b X = np.ones((1024), dtype='float32') Y = 2*np.ones((1024), dtype='float32') print add(X, Y) [3., 3., … 3.]
  18. 18. GPU computing resources Documentation Intro to Parallel Programming by David Luebke (Nvidia) and John Owens (UC Davis) Heterogeneous Parallel Programming by Wen-mei W. Hwu (UIUC) Tesla K20/K40 test drive http://www.nvidia.ru/object/k40-gpu-test-drive-ru.html

×