PyCUDA:
Harnessing the power of GPU with Python
Talk Structure




                    1. Why a GPU ?
                    2. How does It works ?
                    3. Ho...
Talk Structure




                    1. Why a GPU ?
                    2. How does It works ?
                    3. Ho...
WHY A GPU ?


PyCon 4 – Florence 2010 – Fabrizio Milo
APPLICATIONS & DEMOS


PyCon 4 – Florence 2010 – Fabrizio Milo
Why GPU?




PyCon 4 – Florence 2010 – Fabrizio Milo
Talk Structure




                    1. Why a GPU ?
                    2. How does it works ?
                    3. Ho...
How does it works ?




PyCon 4 – Florence 2010 – Fabrizio Milo
ALU   ALU

                                          Control

                                                            ...
DRAM




                                          GPU
PyCon 4 – Florence 2010 – Fabrizio Milo
ALU   ALU
                   Control
                                              ALU   ALU



                          ...
CUDA




PyCon 4 – Florence 2010 – Fabrizio Milo
Compute Unified Device Architecture




PyCon 4 – Florence 2010 – Fabrizio Milo
CUDA
                      A Parallel Computing Architecture for NVIDIA GPUs




                                         ...
Execution Model

                        CUDA
                                          Device Model




PyCon 4 – Florenc...
EXECUTION MODEL


PyCon 4 – Florence 2010 – Fabrizio Milo
Thread
                            Smallest unit of logic




PyCon 4 – Florence 2010 – Fabrizio Milo
A Block
                            A Group of Threads




PyCon 4 – Florence 2010 – Fabrizio Milo
A Grid
                            A Group of Blocks




PyCon 4 – Florence 2010 – Fabrizio Milo
One Block can have many threads




PyCon 4 – Florence 2010 – Fabrizio Milo
One Grid can have many blocks




PyCon 4 – Florence 2010 – Fabrizio Milo
The hardware

     DEVICE MODEL


PyCon 4 – Florence 2010 – Fabrizio Milo
Scalar Processor




PyCon 4 – Florence 2010 – Fabrizio Milo
Scalar Processor




PyCon 4 – Florence 2010 – Fabrizio Milo
Many Scalar Processors




PyCon 4 – Florence 2010 – Fabrizio Milo
+ Register File




PyCon 4 – Florence 2010 – Fabrizio Milo
+ Shared Memory




PyCon 4 – Florence 2010 – Fabrizio Milo
Multiprocessor




PyCon 4 – Florence 2010 – Fabrizio Milo
Device




PyCon 4 – Florence 2010 – Fabrizio Milo
Real Example: 10-Series Architecture

"   240 Scalar Processor (SP) cores execute kernel threads
"   30 Streaming Multipro...
Software   Hardware

                                                         Scalar
                                     ...
Global Memory




PyCon 4 – Florence 2010 – Fabrizio Milo
Global Memory




PyCon 4 – Florence 2010 – Fabrizio Milo
RAM




                                     CPU    Global Memory




                            Host - Device




PyCon ...
RAM




                                     CPU




                            Host – Multi Device




PyCon 4 – Florenc...
1. Why a GPU ?
                    2. How does It works ?
                    3. How do I Program it ?
                   ...
Software   Hardware

                                                         Scalar
                                     ...
Kernel


__global__ void multiply_them( float *dest,
     	   	     	    	    	     	 float *a, 	
     	   	     	    	   ...
Kernel


__global__ void multiply_them( float *dest,
     	   	     	    	    	     	 float *a, 	
     	   	     	    	   ...
Kernel


__global__ void multiply_them( float *dest,
     	   	     	    	    	   	 float *a, 	
     	   	     	    	    	...
Kernel


__global__ void kernel( … )	
{	
   const int idx =	

                blockIdx.x * blockDim.x + threadIdx.x;	
    ...
How do I Program it ?


                                          Main Logic   Kernel


                                  ...
How do I Program it ?


                                          Main Logic                Kernel


                     ...
RAM




                                     CPU    Global Memory




                            Host - Device




PyCon ...
RAM




                                     CPU   Global Memory




PyCon 4 – Florence 2010 – Fabrizio Milo
Allocate Memory


cudaMalloc( pointer, size )	




 PyCon 4 – Florence 2010 – Fabrizio Milo
Copy to device


cudaMalloc( pointer, size )	

cudaMemcpy( dest, src, size, direction)	




 PyCon 4 – Florence 2010 – Fab...
Kernel Launch


cudaMalloc( pointer, size )	

cudaMemcpy( dest, src, size, direction)	

Kernel<<< # blocks, # threads >> (...
Get Back the Results


cudaMalloc( pointer, size )	

cudaMemcpy( dest, src, size, direction)	

Kernel<<< # blocks, # threa...
Error Handling




If(cudaMalloc( pointer, size ) != cudaSuccess){	
   handle_error()	
}	




 PyCon 4 – Florence 2010 – F...
And soon it becomes …


If(cudaMalloc( pointer, size ) != cudaSuccess){	
 handle_error()	
}	

if (cudaMemcpy( dest, src, s...
And soon it becomes …
If(cudaMalloc( pointer, size ) != cudaSuccess){	
 handle_error()	                                   ...
PyCon 4 – Florence 2010 – Fabrizio Milo
1. Why a GPU ?
                    2. How does It works ?
                    3. How do I Program it ?
                   ...
+




    & ANDREAS KLOCKNER

    = PYCUDA

PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda Philosopy




                                             Provide
                                            Comp...
PyCuda Philosopy




                                            AutoMatically
                                           ...
PyCuda Philosopy




                                             Check and
                                            Re...
PyCuda Philosopy




                                           Cross
                                          Platform

...
PyCuda Philosopy




                                               Allow
                                            Inte...
PyCuda Philosopy




                                              NumPy
                                            Integ...
NUMPY - ARRAY
PyCon 4 – Florence 2010 – Fabrizio Milo
1       1   1   1   1   1

                                               0                   99




import numpy	

 my_ar...
1   1   1   0   1   1




import numpy	

 my_array = numpy.array([1,] * 100)	

 my_array[3] = 0	
 PyCon 4 – Florence 2010 ...
PyCuda: Workflow




PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda: Workflow




PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda: Workflow




PyCon 4 – Florence 2010 – Fabrizio Milo
Memory Allocation


cuda.mem_alloc( size_bytes )	




 PyCon 4 – Florence 2010 – Fabrizio Milo
Memory Copy


gpu_mem = cuda.mem_alloc( size_bytes )	

cuda.memcpy_htod( gpu_mem, cpu_mem )	




 PyCon 4 – Florence 2010 ...
Kernel


gpu_mem = cuda.mem_alloc( size_bytes )	

cuda.memcpy_htod( gpu_mem, cpu_mem )	

SourceModule(“””	
__global__ void...
Kernel Launch


mod = SourceModule(“””	
__global__ void multiply_them( float *dest, float *a, 	
       	    	      	      ...
PyCon 4 – Florence 2010 – Fabrizio Milo
PyCon 4 – Florence 2010 – Fabrizio Milo
PyCon 4 – Florence 2010 – Fabrizio Milo
Hello Gpu

     DEMO


PyCon 4 – Florence 2010 – Fabrizio Milo
GPUARRAY
PyCon 4 – Florence 2010 – Fabrizio Milo
gpuarray




PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda: GpuArray




   gpuarray.to_gpu(numpy array)	

   numpy array = gpuarray.get()	




PyCon 4 – Florence 2010 – Fabr...
PyCuda: GpuArray




   gpuarray.to_gpu(numpy array)	

   numpy array = gpuarray.get()	

     +, -, !, /, fill, sin, exp, r...
PyCuda: GpuArray: ElementWise



from pycuda.elementwise import ElementwiseKernel




PyCon 4 – Florence 2010 – Fabrizio M...
PyCuda: GpuArray: ElementWise



from pycuda.elementwise import ElementwiseKernel


lincomb = ElementwiseKernel(
      ” fl...
PyCuda: GpuArray: ElementWise



from pycuda.elementwise import ElementwiseKernel


lin comb = ElementwiseKernel(
       ”...
Meta-Programming


__kernel_template__ = “””	
__global__ void kernel( args )	
{	

for (int i=0; i={{ iterations }}; i++){	...
Meta-Programming




PyCon 4 – Florence 2010 – Fabrizio Milo
Meta-Programming




         Generate Source !




PyCon 4 – Florence 2010 – Fabrizio Milo
Performances ?




PyCon 4 – Florence 2010 – Fabrizio Milo
mandelbrot

     DEMO


PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda: Documentation




PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda

WebSite:
http://mathema.tician.de/software/ pycuda

License:
X Consortium License
  (no warranty, free for all use...
In the Future …




    OPENCL

PyCon 4 – Florence 2010 – Fabrizio Milo
THANK YOU & HAVE FUN !


PyCon 4 – Florence 2010 – Fabrizio Milo
?

PyCon 4 – Florence 2010 – Fabrizio Milo
Upcoming SlideShare
Loading in...5
×

PyCuda: Come sfruttare la potenza delle schede video nelle applicazioni python

1,343

Published on

Fabrizio Milo

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,343
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
28
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

PyCuda: Come sfruttare la potenza delle schede video nelle applicazioni python

  1. 1. PyCUDA: Harnessing the power of GPU with Python
  2. 2. Talk Structure 1. Why a GPU ? 2. How does It works ? 3. How do I Program it ? 4. Can I Use Python ? PyCon 4 – Florence 2010 – Fabrizio Milo
  3. 3. Talk Structure 1. Why a GPU ? 2. How does It works ? 3. How do I Program it ? 4. Can I Use Python ? PyCon 4 – Florence 2010 – Fabrizio Milo
  4. 4. WHY A GPU ? PyCon 4 – Florence 2010 – Fabrizio Milo
  5. 5. APPLICATIONS & DEMOS PyCon 4 – Florence 2010 – Fabrizio Milo
  6. 6. Why GPU? PyCon 4 – Florence 2010 – Fabrizio Milo
  7. 7. Talk Structure 1. Why a GPU ? 2. How does it works ? 3. How do I Program it ? 4. Can I Use Python ? PyCon 4 – Florence 2010 – Fabrizio Milo
  8. 8. How does it works ? PyCon 4 – Florence 2010 – Fabrizio Milo
  9. 9. ALU ALU Control ALU ALU Cache DRAM CPU PyCon 4 – Florence 2010 – Fabrizio Milo
  10. 10. DRAM GPU PyCon 4 – Florence 2010 – Fabrizio Milo
  11. 11. ALU ALU Control ALU ALU Cache DRAM DRAM CPU GPU PyCon 4 – Florence 2010 – Fabrizio Milo
  12. 12. CUDA PyCon 4 – Florence 2010 – Fabrizio Milo
  13. 13. Compute Unified Device Architecture PyCon 4 – Florence 2010 – Fabrizio Milo
  14. 14. CUDA A Parallel Computing Architecture for NVIDIA GPUs Direct X Compute PyCon 4 – Florence 2010 – Fabrizio Milo
  15. 15. Execution Model CUDA Device Model PyCon 4 – Florence 2010 – Fabrizio Milo
  16. 16. EXECUTION MODEL PyCon 4 – Florence 2010 – Fabrizio Milo
  17. 17. Thread Smallest unit of logic PyCon 4 – Florence 2010 – Fabrizio Milo
  18. 18. A Block A Group of Threads PyCon 4 – Florence 2010 – Fabrizio Milo
  19. 19. A Grid A Group of Blocks PyCon 4 – Florence 2010 – Fabrizio Milo
  20. 20. One Block can have many threads PyCon 4 – Florence 2010 – Fabrizio Milo
  21. 21. One Grid can have many blocks PyCon 4 – Florence 2010 – Fabrizio Milo
  22. 22. The hardware DEVICE MODEL PyCon 4 – Florence 2010 – Fabrizio Milo
  23. 23. Scalar Processor PyCon 4 – Florence 2010 – Fabrizio Milo
  24. 24. Scalar Processor PyCon 4 – Florence 2010 – Fabrizio Milo
  25. 25. Many Scalar Processors PyCon 4 – Florence 2010 – Fabrizio Milo
  26. 26. + Register File PyCon 4 – Florence 2010 – Fabrizio Milo
  27. 27. + Shared Memory PyCon 4 – Florence 2010 – Fabrizio Milo
  28. 28. Multiprocessor PyCon 4 – Florence 2010 – Fabrizio Milo
  29. 29. Device PyCon 4 – Florence 2010 – Fabrizio Milo
  30. 30. Real Example: 10-Series Architecture "   240 Scalar Processor (SP) cores execute kernel threads "   30 Streaming Multiprocessors (SMs) each contain " 8 scalar processors   "  1 double precision unit "  Shared memory PyCon 4 – Florence 2010 – Fabrizio Milo
  31. 31. Software Hardware Scalar Processor Thread Thread Block Multiprocessor Grid Device PyCon 4 – Florence 2010 – Fabrizio Milo
  32. 32. Global Memory PyCon 4 – Florence 2010 – Fabrizio Milo
  33. 33. Global Memory PyCon 4 – Florence 2010 – Fabrizio Milo
  34. 34. RAM CPU Global Memory Host - Device PyCon 4 – Florence 2010 – Fabrizio Milo
  35. 35. RAM CPU Host – Multi Device PyCon 4 – Florence 2010 – Fabrizio Milo
  36. 36. 1. Why a GPU ? 2. How does It works ? 3. How do I Program it ? 4. Can I Use Python ? PyCon 4 – Florence 2010 – Fabrizio Milo
  37. 37. Software Hardware Scalar Processor Thread Thread Block Multiprocessor Grid Device PyCon 4 – Florence 2010 – Fabrizio Milo
  38. 38. Kernel __global__ void multiply_them( float *dest, float *a, float *b ) { const int i = threadIdx.x; dest[i] = a[i] * b[i]; } Thread PyCon 4 – Florence 2010 – Fabrizio Milo
  39. 39. Kernel __global__ void multiply_them( float *dest, float *a, float *b ) { const int i = threadIdx.x; dest[i] = a[i] * b[i]; } Thread PyCon 4 – Florence 2010 – Fabrizio Milo
  40. 40. Kernel __global__ void multiply_them( float *dest, float *a, float *b ) { const int i = threadIdx.x; dest[i] = a[i] * b[i]; } Block PyCon 4 – Florence 2010 – Fabrizio Milo
  41. 41. Kernel __global__ void kernel( … ) { const int idx = blockIdx.x * blockDim.x + threadIdx.x; … } Grid PyCon 4 – Florence 2010 – Fabrizio Milo
  42. 42. How do I Program it ? Main Logic Kernel GCC NVCC CPU .bin .cubin GPU PyCon 4 – Florence 2010 – Fabrizio Milo
  43. 43. How do I Program it ? Main Logic Kernel GCC NVCC GPU .bin .cubin .bin .cubin . CPU PyCon 4 – Florence 2010 – Fabrizio Milo
  44. 44. RAM CPU Global Memory Host - Device PyCon 4 – Florence 2010 – Fabrizio Milo
  45. 45. RAM CPU Global Memory PyCon 4 – Florence 2010 – Fabrizio Milo
  46. 46. Allocate Memory cudaMalloc( pointer, size ) PyCon 4 – Florence 2010 – Fabrizio Milo
  47. 47. Copy to device cudaMalloc( pointer, size ) cudaMemcpy( dest, src, size, direction) PyCon 4 – Florence 2010 – Fabrizio Milo
  48. 48. Kernel Launch cudaMalloc( pointer, size ) cudaMemcpy( dest, src, size, direction) Kernel<<< # blocks, # threads >> (*params) PyCon 4 – Florence 2010 – Fabrizio Milo
  49. 49. Get Back the Results cudaMalloc( pointer, size ) cudaMemcpy( dest, src, size, direction) Kernel<<< # blocks, # threads >> (*params) cudaMemcpy( dest, src, size, direction) PyCon 4 – Florence 2010 – Fabrizio Milo
  50. 50. Error Handling If(cudaMalloc( pointer, size ) != cudaSuccess){ handle_error() } PyCon 4 – Florence 2010 – Fabrizio Milo
  51. 51. And soon it becomes … If(cudaMalloc( pointer, size ) != cudaSuccess){ handle_error() } if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {} If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){ handle_error() } If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { } PyCon 4 – Florence 2010 – Fabrizio Milo
  52. 52. And soon it becomes … If(cudaMalloc( pointer, size ) != cudaSuccess){ handle_error() If(cudaMalloc( pointer, size ) != cudaSuccess){ } handle_error() } if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {} if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {} If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){ handle_error() If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){ } handle_error() } If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { } If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { } If(cudaMalloc( pointer, size ) != cudaSuccess){ handle_error() If(cudaMalloc( pointer, size ) != cudaSuccess){ } handle_error() } if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {} if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {} If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){ handle_error() If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){ } handle_error() } If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { } If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { } If(cudaMalloc( pointer, size ) != cudaSuccess){ handle_error() If(cudaMalloc( pointer, size ) != cudaSuccess){ } handle_error() } if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {} if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {} If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){ handle_error() If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){ } handle_error() } If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { } If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { } PyCon 4 – Florence 2010 – Fabrizio Milo
  53. 53. PyCon 4 – Florence 2010 – Fabrizio Milo
  54. 54. 1. Why a GPU ? 2. How does It works ? 3. How do I Program it ? 4. Can I Use Python ? PyCon 4 – Florence 2010 – Fabrizio Milo
  55. 55. + & ANDREAS KLOCKNER = PYCUDA PyCon 4 – Florence 2010 – Fabrizio Milo
  56. 56. PyCuda Philosopy Provide Complete Access PyCon 4 – Florence 2010 – Fabrizio Milo
  57. 57. PyCuda Philosopy AutoMatically Manage Resources PyCon 4 – Florence 2010 – Fabrizio Milo
  58. 58. PyCuda Philosopy Check and Report Errors PyCon 4 – Florence 2010 – Fabrizio Milo
  59. 59. PyCuda Philosopy Cross Platform PyCon 4 – Florence 2010 – Fabrizio Milo
  60. 60. PyCuda Philosopy Allow Interactive Use PyCon 4 – Florence 2010 – Fabrizio Milo
  61. 61. PyCuda Philosopy NumPy Integration PyCon 4 – Florence 2010 – Fabrizio Milo
  62. 62. NUMPY - ARRAY PyCon 4 – Florence 2010 – Fabrizio Milo
  63. 63. 1 1 1 1 1 1 0 99 import numpy my_array = numpy.array([1,] * 100) PyCon 4 – Florence 2010 – Fabrizio Milo
  64. 64. 1 1 1 0 1 1 import numpy my_array = numpy.array([1,] * 100) my_array[3] = 0 PyCon 4 – Florence 2010 – Fabrizio Milo
  65. 65. PyCuda: Workflow PyCon 4 – Florence 2010 – Fabrizio Milo
  66. 66. PyCuda: Workflow PyCon 4 – Florence 2010 – Fabrizio Milo
  67. 67. PyCuda: Workflow PyCon 4 – Florence 2010 – Fabrizio Milo
  68. 68. Memory Allocation cuda.mem_alloc( size_bytes ) PyCon 4 – Florence 2010 – Fabrizio Milo
  69. 69. Memory Copy gpu_mem = cuda.mem_alloc( size_bytes ) cuda.memcpy_htod( gpu_mem, cpu_mem ) PyCon 4 – Florence 2010 – Fabrizio Milo
  70. 70. Kernel gpu_mem = cuda.mem_alloc( size_bytes ) cuda.memcpy_htod( gpu_mem, cpu_mem ) SourceModule(“”” __global__ void multiply_them( float *dest, float *a, float *b ) { const int i = threadIdx.x; dest[i] = a[i] * b[i]; }”””) PyCon 4 – Florence 2010 – Fabrizio Milo
  71. 71. Kernel Launch mod = SourceModule(“”” __global__ void multiply_them( float *dest, float *a, float *b ) { const int i = threadIdx.x; dest[i] = a[i] * b[i]; }”””) multiply_them = mod.get_function(“multiply_them”) multiply_them ( *args, block=(30, 64, 1)) PyCon 4 – Florence 2010 – Fabrizio Milo
  72. 72. PyCon 4 – Florence 2010 – Fabrizio Milo
  73. 73. PyCon 4 – Florence 2010 – Fabrizio Milo
  74. 74. PyCon 4 – Florence 2010 – Fabrizio Milo
  75. 75. Hello Gpu DEMO PyCon 4 – Florence 2010 – Fabrizio Milo
  76. 76. GPUARRAY PyCon 4 – Florence 2010 – Fabrizio Milo
  77. 77. gpuarray PyCon 4 – Florence 2010 – Fabrizio Milo
  78. 78. PyCuda: GpuArray gpuarray.to_gpu(numpy array) numpy array = gpuarray.get() PyCon 4 – Florence 2010 – Fabrizio Milo
  79. 79. PyCuda: GpuArray gpuarray.to_gpu(numpy array) numpy array = gpuarray.get() +, -, !, /, fill, sin, exp, rand, basic indexing, norm, inner product … PyCon 4 – Florence 2010 – Fabrizio Milo
  80. 80. PyCuda: GpuArray: ElementWise from pycuda.elementwise import ElementwiseKernel PyCon 4 – Florence 2010 – Fabrizio Milo
  81. 81. PyCuda: GpuArray: ElementWise from pycuda.elementwise import ElementwiseKernel lincomb = ElementwiseKernel( ” float a , float !x , float b , float !y , float !z”, ”z [ i ] = a !x[ i ] + b!y[i ] ” ) PyCon 4 – Florence 2010 – Fabrizio Milo
  82. 82. PyCuda: GpuArray: ElementWise from pycuda.elementwise import ElementwiseKernel lin comb = ElementwiseKernel( ” float a , float !x , float b , float !y , float !z”, ”z [ i ] = a !x[ i ] + b!y[i ] ” ) c gpu = gpuarray. empty like (a gpu) lincomb (5, a gpu, 6, b gpu, c gpu) assert la . norm((c gpu ! (5!a gpu+6!b gpu)).get()) < 1e!5 PyCon 4 – Florence 2010 – Fabrizio Milo
  83. 83. Meta-Programming __kernel_template__ = “”” __global__ void kernel( args ) { for (int i=0; i={{ iterations }}; i++){ {{operations}} } }””” See for example jinja2 PyCon 4 – Florence 2010 – Fabrizio Milo
  84. 84. Meta-Programming PyCon 4 – Florence 2010 – Fabrizio Milo
  85. 85. Meta-Programming Generate Source ! PyCon 4 – Florence 2010 – Fabrizio Milo
  86. 86. Performances ? PyCon 4 – Florence 2010 – Fabrizio Milo
  87. 87. mandelbrot DEMO PyCon 4 – Florence 2010 – Fabrizio Milo
  88. 88. PyCuda: Documentation PyCon 4 – Florence 2010 – Fabrizio Milo
  89. 89. PyCuda WebSite: http://mathema.tician.de/software/ pycuda License: X Consortium License (no warranty, free for all use) Dependencies: Python 2.4+, numpy, Boost PyCon 4 – Florence 2010 – Fabrizio Milo
  90. 90. In the Future … OPENCL PyCon 4 – Florence 2010 – Fabrizio Milo
  91. 91. THANK YOU & HAVE FUN ! PyCon 4 – Florence 2010 – Fabrizio Milo
  92. 92. ? PyCon 4 – Florence 2010 – Fabrizio Milo
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×