SlideShare a Scribd company logo
1 of 92
Download to read offline
PyCUDA:
Harnessing the power of GPU with Python
Talk Structure




                    1.β€―Why a GPU ?
                    2.β€―How does It works ?
                    3.β€―How do I Program it ?
                    4.β€―Can I Use Python ?

PyCon 4 – Florence 2010 – Fabrizio Milo
Talk Structure




                    1.β€―Why a GPU ?
                    2.β€―How does It works ?
                    3.β€―How do I Program it ?
                    4.β€―Can I Use Python ?

PyCon 4 – Florence 2010 – Fabrizio Milo
WHY A GPU ?


PyCon 4 – Florence 2010 – Fabrizio Milo
APPLICATIONS & DEMOS


PyCon 4 – Florence 2010 – Fabrizio Milo
Why GPU?




PyCon 4 – Florence 2010 – Fabrizio Milo
Talk Structure




                    1.β€―Why a GPU ?
                    2.β€―How does it works ?
                    3.β€―How do I Program it ?
                    4.β€―Can I Use Python ?

PyCon 4 – Florence 2010 – Fabrizio Milo
How does it works ?




PyCon 4 – Florence 2010 – Fabrizio Milo
ALU   ALU

                                          Control

                                                            ALU   ALU




                                                    Cache




                                DRAM




                                                    CPU
PyCon 4 – Florence 2010 – Fabrizio Milo
DRAM




                                          GPU
PyCon 4 – Florence 2010 – Fabrizio Milo
ALU   ALU
                   Control
                                              ALU   ALU



                                      Cache




           DRAM                                           DRAM



                                      CPU                        GPU




PyCon 4 – Florence 2010 – Fabrizio Milo
CUDA




PyCon 4 – Florence 2010 – Fabrizio Milo
Compute Unified Device Architecture




PyCon 4 – Florence 2010 – Fabrizio Milo
CUDA
                      A Parallel Computing Architecture for NVIDIA GPUs




                                                Direct X
                                               Compute




PyCon 4 – Florence 2010 – Fabrizio Milo
Execution Model

                        CUDA
                                          Device Model




PyCon 4 – Florence 2010 – Fabrizio Milo
EXECUTION MODEL


PyCon 4 – Florence 2010 – Fabrizio Milo
Thread
                            Smallest unit of logic




PyCon 4 – Florence 2010 – Fabrizio Milo
A Block
                            A Group of Threads




PyCon 4 – Florence 2010 – Fabrizio Milo
A Grid
                            A Group of Blocks




PyCon 4 – Florence 2010 – Fabrizio Milo
One Block can have many threads




PyCon 4 – Florence 2010 – Fabrizio Milo
One Grid can have many blocks




PyCon 4 – Florence 2010 – Fabrizio Milo
The hardware

     DEVICE MODEL


PyCon 4 – Florence 2010 – Fabrizio Milo
Scalar Processor




PyCon 4 – Florence 2010 – Fabrizio Milo
Scalar Processor




PyCon 4 – Florence 2010 – Fabrizio Milo
Many Scalar Processors




PyCon 4 – Florence 2010 – Fabrizio Milo
+ Register File




PyCon 4 – Florence 2010 – Fabrizio Milo
+ Shared Memory




PyCon 4 – Florence 2010 – Fabrizio Milo
Multiprocessor




PyCon 4 – Florence 2010 – Fabrizio Milo
Device




PyCon 4 – Florence 2010 – Fabrizio Milo
Real Example: 10-Series Architecture

" β€― 240 Scalar Processor (SP) cores execute kernel threads
" β€― 30 Streaming Multiprocessors (SMs) each contain
         " 8 scalar processors
            β€―
         " β€―1 double precision unit
         " β€―Shared memory




PyCon 4 – Florence 2010 – Fabrizio Milo
Software   Hardware

                                                         Scalar
                                                       Processor
                                           Thread




                                           Thread
                                            Block    Multiprocessor




                                            Grid        Device
PyCon 4 – Florence 2010 – Fabrizio Milo
Global Memory




PyCon 4 – Florence 2010 – Fabrizio Milo
Global Memory




PyCon 4 – Florence 2010 – Fabrizio Milo
RAM




                                     CPU    Global Memory




                            Host - Device




PyCon 4 – Florence 2010 – Fabrizio Milo
RAM




                                     CPU




                            Host – Multi Device




PyCon 4 – Florence 2010 – Fabrizio Milo
1.β€―Why a GPU ?
                    2.β€―How does It works ?
                    3.β€―How do I Program it ?
                    4.β€―Can I Use Python ?

PyCon 4 – Florence 2010 – Fabrizio Milo
Software   Hardware

                                                         Scalar
                                                       Processor
                                           Thread




                                           Thread
                                            Block    Multiprocessor




                                            Grid        Device
PyCon 4 – Florence 2010 – Fabrizio Milo
Kernel


__global__ void multiply_them( float *dest,
     	   	     	    	    	     	 float *a, 	
     	   	     	    	    	     	 float *b )	
{	
   const int i = threadIdx.x;	
   dest[i] = a[i] * b[i];	
}	




                                          Thread
PyCon 4 – Florence 2010 – Fabrizio Milo
Kernel


__global__ void multiply_them( float *dest,
     	   	     	    	    	     	 float *a, 	
     	   	     	    	    	     	 float *b )	
{	
   const int i = threadIdx.x;	
   dest[i] = a[i] * b[i];	
}	




                                          Thread
PyCon 4 – Florence 2010 – Fabrizio Milo
Kernel


__global__ void multiply_them( float *dest,
     	   	     	    	    	   	 float *a, 	
     	   	     	    	    	   	 float *b )	
{	
   const int i = threadIdx.x;	
   dest[i] = a[i] * b[i];	
}	




                                          Block
PyCon 4 – Florence 2010 – Fabrizio Milo
Kernel


__global__ void kernel( … )	
{	
   const int idx =	

                blockIdx.x * blockDim.x + threadIdx.x;	
        …	
}	




                                          Grid
PyCon 4 – Florence 2010 – Fabrizio Milo
How do I Program it ?


                                          Main Logic   Kernel


                                            GCC
                                                       NVCC




         CPU                                 .bin      .cubin   GPU




PyCon 4 – Florence 2010 – Fabrizio Milo
How do I Program it ?


                                          Main Logic                Kernel


                                            GCC
                                                                    NVCC



                                                                             GPU

                                             .bin                   .cubin




                                                    .bin   .cubin     .      CPU

PyCon 4 – Florence 2010 – Fabrizio Milo
RAM




                                     CPU    Global Memory




                            Host - Device




PyCon 4 – Florence 2010 – Fabrizio Milo
RAM




                                     CPU   Global Memory




PyCon 4 – Florence 2010 – Fabrizio Milo
Allocate Memory


cudaMalloc( pointer, size )	




 PyCon 4 – Florence 2010 – Fabrizio Milo
Copy to device


cudaMalloc( pointer, size )	

cudaMemcpy( dest, src, size, direction)	




 PyCon 4 – Florence 2010 – Fabrizio Milo
Kernel Launch


cudaMalloc( pointer, size )	

cudaMemcpy( dest, src, size, direction)	

Kernel<<< # blocks, # threads >> (*params)	




 PyCon 4 – Florence 2010 – Fabrizio Milo
Get Back the Results


cudaMalloc( pointer, size )	

cudaMemcpy( dest, src, size, direction)	

Kernel<<< # blocks, # threads >> (*params)	

cudaMemcpy( dest, src, size, direction)	




 PyCon 4 – Florence 2010 – Fabrizio Milo
Error Handling




If(cudaMalloc( pointer, size ) != cudaSuccess){	
   handle_error()	
}	




 PyCon 4 – Florence 2010 – Fabrizio Milo
And soon it becomes …


If(cudaMalloc( pointer, size ) != cudaSuccess){	
 handle_error()	
}	

if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {}	

If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){	
 handle_error()	
}	

If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { }	




  PyCon 4 – Florence 2010 – Fabrizio Milo
And soon it becomes …
If(cudaMalloc( pointer, size ) != cudaSuccess){	
 handle_error()	                                                     If(cudaMalloc( pointer, size ) != cudaSuccess){	
}	                                                                    handle_error()	
                                                                     }	
if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {}	
                                                                     if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {}	
If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){	
 handle_error()	                                                     If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){	
}	                                                                    handle_error()	
                                                                     }	
If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { }	
                                                                     If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { }	

 If(cudaMalloc( pointer, size ) != cudaSuccess){	
  handle_error()	                                                     If(cudaMalloc( pointer, size ) != cudaSuccess){	
 }	                                                                    handle_error()	
                                                                      }	
 if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {}	
                                                                      if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {}	
 If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){	
  handle_error()	                                                     If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){	
 }	                                                                    handle_error()	
                                                                      }	
 If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { }	
                                                                      If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { }	


  If(cudaMalloc( pointer, size ) != cudaSuccess){	
   handle_error()	                                                     If(cudaMalloc( pointer, size ) != cudaSuccess){	
  }	                                                                    handle_error()	
                                                                       }	
  if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {}	
                                                                       if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {}	
  If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){	
   handle_error()	                                                     If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){	
  }	                                                                    handle_error()	
                                                                       }	
  If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { }	
                                                                       If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { }	




  PyCon 4 – Florence 2010 – Fabrizio Milo
PyCon 4 – Florence 2010 – Fabrizio Milo
1.β€―Why a GPU ?
                    2.β€―How does It works ?
                    3.β€―How do I Program it ?
                    4.β€―Can I Use Python ?

PyCon 4 – Florence 2010 – Fabrizio Milo
+




    & ANDREAS KLOCKNER

    = PYCUDA

PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda Philosopy




                                             Provide
                                            Complete
                                             Access

  PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda Philosopy




                                            AutoMatically
                                              Manage
                                             Resources

  PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda Philosopy




                                             Check and
                                            Report Errors



  PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda Philosopy




                                           Cross
                                          Platform



PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda Philosopy




                                               Allow
                                            Interactive
                                                Use


  PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda Philosopy




                                              NumPy
                                            Integration



  PyCon 4 – Florence 2010 – Fabrizio Milo
NUMPY - ARRAY
PyCon 4 – Florence 2010 – Fabrizio Milo
1       1   1   1   1   1

                                               0                   99




import numpy	

 my_array = numpy.array([1,] * 100)	



 PyCon 4 – Florence 2010 – Fabrizio Milo
1   1   1   0   1   1




import numpy	

 my_array = numpy.array([1,] * 100)	

 my_array[3] = 0	
 PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda: Workflow




PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda: Workflow




PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda: Workflow




PyCon 4 – Florence 2010 – Fabrizio Milo
Memory Allocation


cuda.mem_alloc( size_bytes )	




 PyCon 4 – Florence 2010 – Fabrizio Milo
Memory Copy


gpu_mem = cuda.mem_alloc( size_bytes )	

cuda.memcpy_htod( gpu_mem, cpu_mem )	




 PyCon 4 – Florence 2010 – Fabrizio Milo
Kernel


gpu_mem = cuda.mem_alloc( size_bytes )	

cuda.memcpy_htod( gpu_mem, cpu_mem )	

SourceModule(β€œβ€β€	
__global__ void multiply_them( float *dest, float *a, 	
       	    	      	      	    	      	      float *b )	
{	
   const int i = threadIdx.x;	
   dest[i] = a[i] * b[i];	
}”””)	




  PyCon 4 – Florence 2010 – Fabrizio Milo
Kernel Launch


mod = SourceModule(β€œβ€β€	
__global__ void multiply_them( float *dest, float *a, 	
       	    	      	      	    	      	      float *b )	
{	
   const int i = threadIdx.x;	
   dest[i] = a[i] * b[i];	
}”””)	

multiply_them = mod.get_function(β€œmultiply_them”)	
multiply_them ( *args, block=(30, 64, 1))	




  PyCon 4 – Florence 2010 – Fabrizio Milo
PyCon 4 – Florence 2010 – Fabrizio Milo
PyCon 4 – Florence 2010 – Fabrizio Milo
PyCon 4 – Florence 2010 – Fabrizio Milo
Hello Gpu

     DEMO


PyCon 4 – Florence 2010 – Fabrizio Milo
GPUARRAY
PyCon 4 – Florence 2010 – Fabrizio Milo
gpuarray




PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda: GpuArray




   gpuarray.to_gpu(numpy array)	

   numpy array = gpuarray.get()	




PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda: GpuArray




   gpuarray.to_gpu(numpy array)	

   numpy array = gpuarray.get()	

     +, -, !, /, fill, sin, exp, rand, basic
     indexing, norm, inner product …

PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda: GpuArray: ElementWise



from pycuda.elementwise import ElementwiseKernel




PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda: GpuArray: ElementWise



from pycuda.elementwise import ElementwiseKernel


lincomb = ElementwiseKernel(
      ” float a , float !x , float b , float !y , float !z”,
      ”z [ i ] = a !x[ i ] + b!y[i ] ”
)




PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda: GpuArray: ElementWise



from pycuda.elementwise import ElementwiseKernel


lin comb = ElementwiseKernel(
       ” float a , float !x , float b , float !y , float !z”,
       ”z [ i ] = a !x[ i ] + b!y[i ] ”
)

c gpu = gpuarray. empty like (a gpu)
lincomb (5, a gpu, 6, b gpu, c gpu)

assert la . norm((c gpu ! (5!a gpu+6!b gpu)).get()) < 1e!5
PyCon 4 – Florence 2010 – Fabrizio Milo
Meta-Programming


__kernel_template__ = β€œβ€β€	
__global__ void kernel( args )	
{	

for (int i=0; i={{ iterations }}; i++){	
 {{operations}}	
}	

}”””	




  See for example jinja2

  PyCon 4 – Florence 2010 – Fabrizio Milo
Meta-Programming




PyCon 4 – Florence 2010 – Fabrizio Milo
Meta-Programming




         Generate Source !




PyCon 4 – Florence 2010 – Fabrizio Milo
Performances ?




PyCon 4 – Florence 2010 – Fabrizio Milo
mandelbrot

     DEMO


PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda: Documentation




PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda

WebSite:
http://mathema.tician.de/software/ pycuda

License:
X Consortium License
  (no warranty, free for all use)

Dependencies:
  Python 2.4+, numpy, Boost
 PyCon 4 – Florence 2010 – Fabrizio Milo
In the Future …




    OPENCL

PyCon 4 – Florence 2010 – Fabrizio Milo
THANK YOU & HAVE FUN !


PyCon 4 – Florence 2010 – Fabrizio Milo
?

PyCon 4 – Florence 2010 – Fabrizio Milo

More Related Content

More from PyCon Italia

Spyppolare o non spyppolare
Spyppolare o non spyppolareSpyppolare o non spyppolare
Spyppolare o non spyppolarePyCon Italia
Β 
zc.buildout: "Un modo estremamente civile per sviluppare un'applicazione"
zc.buildout: "Un modo estremamente civile per sviluppare un'applicazione"zc.buildout: "Un modo estremamente civile per sviluppare un'applicazione"
zc.buildout: "Un modo estremamente civile per sviluppare un'applicazione"PyCon Italia
Β 
Undici anni di lavoro con Python
Undici anni di lavoro con PythonUndici anni di lavoro con Python
Undici anni di lavoro con PythonPyCon Italia
Β 
socket e SocketServer: il framework per i server Internet in Python
socket e SocketServer: il framework per i server Internet in Pythonsocket e SocketServer: il framework per i server Internet in Python
socket e SocketServer: il framework per i server Internet in PythonPyCon Italia
Β 
Qt mobile PySide bindings
Qt mobile PySide bindingsQt mobile PySide bindings
Qt mobile PySide bindingsPyCon Italia
Β 
Python: ottimizzazione numerica algoritmi genetici
Python: ottimizzazione numerica algoritmi geneticiPython: ottimizzazione numerica algoritmi genetici
Python: ottimizzazione numerica algoritmi geneticiPyCon Italia
Β 
Python idiomatico
Python idiomaticoPython idiomatico
Python idiomaticoPyCon Italia
Β 
Python in the browser
Python in the browserPython in the browser
Python in the browserPyCon Italia
Β 
PyPy 1.2: snakes never crawled so fast
PyPy 1.2: snakes never crawled so fastPyPy 1.2: snakes never crawled so fast
PyPy 1.2: snakes never crawled so fastPyCon Italia
Β 
OpenERP e l'arte della gestione aziendale con Python
OpenERP e l'arte della gestione aziendale con PythonOpenERP e l'arte della gestione aziendale con Python
OpenERP e l'arte della gestione aziendale con PythonPyCon Italia
Β 
New and improved: Coming changes to the unittest module
 	 New and improved: Coming changes to the unittest module 	 New and improved: Coming changes to the unittest module
New and improved: Coming changes to the unittest modulePyCon Italia
Β 
Monitoraggio del Traffico di Rete Usando Python ed ntop
Monitoraggio del Traffico di Rete Usando Python ed ntopMonitoraggio del Traffico di Rete Usando Python ed ntop
Monitoraggio del Traffico di Rete Usando Python ed ntopPyCon Italia
Β 
Jython for embedded software validation
Jython for embedded software validationJython for embedded software validation
Jython for embedded software validationPyCon Italia
Β 
Foxgame introduzione all'apprendimento automatico
Foxgame introduzione all'apprendimento automaticoFoxgame introduzione all'apprendimento automatico
Foxgame introduzione all'apprendimento automaticoPyCon Italia
Β 
Effective EC2
Effective EC2Effective EC2
Effective EC2PyCon Italia
Β 
Django Γ¨ pronto per l'Enterprise
Django Γ¨ pronto per l'EnterpriseDjango Γ¨ pronto per l'Enterprise
Django Γ¨ pronto per l'EnterprisePyCon Italia
Β 
Crogioli, alambicchi e beute: dove mettere i vostri dati.
Crogioli, alambicchi e beute: dove mettere i vostri dati.Crogioli, alambicchi e beute: dove mettere i vostri dati.
Crogioli, alambicchi e beute: dove mettere i vostri dati.PyCon Italia
Β 
Comet web applications with Python, Django & Orbited
Comet web applications with Python, Django & OrbitedComet web applications with Python, Django & Orbited
Comet web applications with Python, Django & OrbitedPyCon Italia
Β 
Cleanup and new optimizations in WPython 1.1
Cleanup and new optimizations in WPython 1.1Cleanup and new optimizations in WPython 1.1
Cleanup and new optimizations in WPython 1.1PyCon Italia
Β 

More from PyCon Italia (19)

Spyppolare o non spyppolare
Spyppolare o non spyppolareSpyppolare o non spyppolare
Spyppolare o non spyppolare
Β 
zc.buildout: "Un modo estremamente civile per sviluppare un'applicazione"
zc.buildout: "Un modo estremamente civile per sviluppare un'applicazione"zc.buildout: "Un modo estremamente civile per sviluppare un'applicazione"
zc.buildout: "Un modo estremamente civile per sviluppare un'applicazione"
Β 
Undici anni di lavoro con Python
Undici anni di lavoro con PythonUndici anni di lavoro con Python
Undici anni di lavoro con Python
Β 
socket e SocketServer: il framework per i server Internet in Python
socket e SocketServer: il framework per i server Internet in Pythonsocket e SocketServer: il framework per i server Internet in Python
socket e SocketServer: il framework per i server Internet in Python
Β 
Qt mobile PySide bindings
Qt mobile PySide bindingsQt mobile PySide bindings
Qt mobile PySide bindings
Β 
Python: ottimizzazione numerica algoritmi genetici
Python: ottimizzazione numerica algoritmi geneticiPython: ottimizzazione numerica algoritmi genetici
Python: ottimizzazione numerica algoritmi genetici
Β 
Python idiomatico
Python idiomaticoPython idiomatico
Python idiomatico
Β 
Python in the browser
Python in the browserPython in the browser
Python in the browser
Β 
PyPy 1.2: snakes never crawled so fast
PyPy 1.2: snakes never crawled so fastPyPy 1.2: snakes never crawled so fast
PyPy 1.2: snakes never crawled so fast
Β 
OpenERP e l'arte della gestione aziendale con Python
OpenERP e l'arte della gestione aziendale con PythonOpenERP e l'arte della gestione aziendale con Python
OpenERP e l'arte della gestione aziendale con Python
Β 
New and improved: Coming changes to the unittest module
 	 New and improved: Coming changes to the unittest module 	 New and improved: Coming changes to the unittest module
New and improved: Coming changes to the unittest module
Β 
Monitoraggio del Traffico di Rete Usando Python ed ntop
Monitoraggio del Traffico di Rete Usando Python ed ntopMonitoraggio del Traffico di Rete Usando Python ed ntop
Monitoraggio del Traffico di Rete Usando Python ed ntop
Β 
Jython for embedded software validation
Jython for embedded software validationJython for embedded software validation
Jython for embedded software validation
Β 
Foxgame introduzione all'apprendimento automatico
Foxgame introduzione all'apprendimento automaticoFoxgame introduzione all'apprendimento automatico
Foxgame introduzione all'apprendimento automatico
Β 
Effective EC2
Effective EC2Effective EC2
Effective EC2
Β 
Django Γ¨ pronto per l'Enterprise
Django Γ¨ pronto per l'EnterpriseDjango Γ¨ pronto per l'Enterprise
Django Γ¨ pronto per l'Enterprise
Β 
Crogioli, alambicchi e beute: dove mettere i vostri dati.
Crogioli, alambicchi e beute: dove mettere i vostri dati.Crogioli, alambicchi e beute: dove mettere i vostri dati.
Crogioli, alambicchi e beute: dove mettere i vostri dati.
Β 
Comet web applications with Python, Django & Orbited
Comet web applications with Python, Django & OrbitedComet web applications with Python, Django & Orbited
Comet web applications with Python, Django & Orbited
Β 
Cleanup and new optimizations in WPython 1.1
Cleanup and new optimizations in WPython 1.1Cleanup and new optimizations in WPython 1.1
Cleanup and new optimizations in WPython 1.1
Β 

Recently uploaded

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
Β 
FULL ENJOY πŸ” 8264348440 πŸ” Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY πŸ” 8264348440 πŸ” Call Girls in Diplomatic Enclave | DelhiFULL ENJOY πŸ” 8264348440 πŸ” Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY πŸ” 8264348440 πŸ” Call Girls in Diplomatic Enclave | Delhisoniya singh
Β 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
Β 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
Β 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
Β 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
Β 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
Β 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
Β 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
Β 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
Β 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
Β 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
Β 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
Β 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
Β 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
Β 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
Β 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
Β 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
Β 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
Β 

Recently uploaded (20)

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
Β 
FULL ENJOY πŸ” 8264348440 πŸ” Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY πŸ” 8264348440 πŸ” Call Girls in Diplomatic Enclave | DelhiFULL ENJOY πŸ” 8264348440 πŸ” Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY πŸ” 8264348440 πŸ” Call Girls in Diplomatic Enclave | Delhi
Β 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
Β 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
Β 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Β 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Β 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Β 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Β 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Β 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
Β 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
Β 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
Β 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
Β 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Β 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Β 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
Β 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Β 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
Β 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
Β 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Β 

PyCuda: Come sfruttare la potenza delle schede video nelle applicazioni python

  • 1. PyCUDA: Harnessing the power of GPU with Python
  • 2. Talk Structure 1.β€―Why a GPU ? 2.β€―How does It works ? 3.β€―How do I Program it ? 4.β€―Can I Use Python ? PyCon 4 – Florence 2010 – Fabrizio Milo
  • 3. Talk Structure 1.β€―Why a GPU ? 2.β€―How does It works ? 3.β€―How do I Program it ? 4.β€―Can I Use Python ? PyCon 4 – Florence 2010 – Fabrizio Milo
  • 4. WHY A GPU ? PyCon 4 – Florence 2010 – Fabrizio Milo
  • 5. APPLICATIONS & DEMOS PyCon 4 – Florence 2010 – Fabrizio Milo
  • 6. Why GPU? PyCon 4 – Florence 2010 – Fabrizio Milo
  • 7. Talk Structure 1.β€―Why a GPU ? 2.β€―How does it works ? 3.β€―How do I Program it ? 4.β€―Can I Use Python ? PyCon 4 – Florence 2010 – Fabrizio Milo
  • 8. How does it works ? PyCon 4 – Florence 2010 – Fabrizio Milo
  • 9. ALU ALU Control ALU ALU Cache DRAM CPU PyCon 4 – Florence 2010 – Fabrizio Milo
  • 10. DRAM GPU PyCon 4 – Florence 2010 – Fabrizio Milo
  • 11. ALU ALU Control ALU ALU Cache DRAM DRAM CPU GPU PyCon 4 – Florence 2010 – Fabrizio Milo
  • 12. CUDA PyCon 4 – Florence 2010 – Fabrizio Milo
  • 13. Compute Unified Device Architecture PyCon 4 – Florence 2010 – Fabrizio Milo
  • 14. CUDA A Parallel Computing Architecture for NVIDIA GPUs Direct X Compute PyCon 4 – Florence 2010 – Fabrizio Milo
  • 15. Execution Model CUDA Device Model PyCon 4 – Florence 2010 – Fabrizio Milo
  • 16. EXECUTION MODEL PyCon 4 – Florence 2010 – Fabrizio Milo
  • 17. Thread Smallest unit of logic PyCon 4 – Florence 2010 – Fabrizio Milo
  • 18. A Block A Group of Threads PyCon 4 – Florence 2010 – Fabrizio Milo
  • 19. A Grid A Group of Blocks PyCon 4 – Florence 2010 – Fabrizio Milo
  • 20. One Block can have many threads PyCon 4 – Florence 2010 – Fabrizio Milo
  • 21. One Grid can have many blocks PyCon 4 – Florence 2010 – Fabrizio Milo
  • 22. The hardware DEVICE MODEL PyCon 4 – Florence 2010 – Fabrizio Milo
  • 23. Scalar Processor PyCon 4 – Florence 2010 – Fabrizio Milo
  • 24. Scalar Processor PyCon 4 – Florence 2010 – Fabrizio Milo
  • 25. Many Scalar Processors PyCon 4 – Florence 2010 – Fabrizio Milo
  • 26. + Register File PyCon 4 – Florence 2010 – Fabrizio Milo
  • 27. + Shared Memory PyCon 4 – Florence 2010 – Fabrizio Milo
  • 28. Multiprocessor PyCon 4 – Florence 2010 – Fabrizio Milo
  • 29. Device PyCon 4 – Florence 2010 – Fabrizio Milo
  • 30. Real Example: 10-Series Architecture " β€― 240 Scalar Processor (SP) cores execute kernel threads " β€― 30 Streaming Multiprocessors (SMs) each contain " 8 scalar processors β€― " β€―1 double precision unit " β€―Shared memory PyCon 4 – Florence 2010 – Fabrizio Milo
  • 31. Software Hardware Scalar Processor Thread Thread Block Multiprocessor Grid Device PyCon 4 – Florence 2010 – Fabrizio Milo
  • 32. Global Memory PyCon 4 – Florence 2010 – Fabrizio Milo
  • 33. Global Memory PyCon 4 – Florence 2010 – Fabrizio Milo
  • 34. RAM CPU Global Memory Host - Device PyCon 4 – Florence 2010 – Fabrizio Milo
  • 35. RAM CPU Host – Multi Device PyCon 4 – Florence 2010 – Fabrizio Milo
  • 36. 1.β€―Why a GPU ? 2.β€―How does It works ? 3.β€―How do I Program it ? 4.β€―Can I Use Python ? PyCon 4 – Florence 2010 – Fabrizio Milo
  • 37. Software Hardware Scalar Processor Thread Thread Block Multiprocessor Grid Device PyCon 4 – Florence 2010 – Fabrizio Milo
  • 38. Kernel __global__ void multiply_them( float *dest, float *a, float *b ) { const int i = threadIdx.x; dest[i] = a[i] * b[i]; } Thread PyCon 4 – Florence 2010 – Fabrizio Milo
  • 39. Kernel __global__ void multiply_them( float *dest, float *a, float *b ) { const int i = threadIdx.x; dest[i] = a[i] * b[i]; } Thread PyCon 4 – Florence 2010 – Fabrizio Milo
  • 40. Kernel __global__ void multiply_them( float *dest, float *a, float *b ) { const int i = threadIdx.x; dest[i] = a[i] * b[i]; } Block PyCon 4 – Florence 2010 – Fabrizio Milo
  • 41. Kernel __global__ void kernel( … ) { const int idx = blockIdx.x * blockDim.x + threadIdx.x; … } Grid PyCon 4 – Florence 2010 – Fabrizio Milo
  • 42. How do I Program it ? Main Logic Kernel GCC NVCC CPU .bin .cubin GPU PyCon 4 – Florence 2010 – Fabrizio Milo
  • 43. How do I Program it ? Main Logic Kernel GCC NVCC GPU .bin .cubin .bin .cubin . CPU PyCon 4 – Florence 2010 – Fabrizio Milo
  • 44. RAM CPU Global Memory Host - Device PyCon 4 – Florence 2010 – Fabrizio Milo
  • 45. RAM CPU Global Memory PyCon 4 – Florence 2010 – Fabrizio Milo
  • 46. Allocate Memory cudaMalloc( pointer, size ) PyCon 4 – Florence 2010 – Fabrizio Milo
  • 47. Copy to device cudaMalloc( pointer, size ) cudaMemcpy( dest, src, size, direction) PyCon 4 – Florence 2010 – Fabrizio Milo
  • 48. Kernel Launch cudaMalloc( pointer, size ) cudaMemcpy( dest, src, size, direction) Kernel<<< # blocks, # threads >> (*params) PyCon 4 – Florence 2010 – Fabrizio Milo
  • 49. Get Back the Results cudaMalloc( pointer, size ) cudaMemcpy( dest, src, size, direction) Kernel<<< # blocks, # threads >> (*params) cudaMemcpy( dest, src, size, direction) PyCon 4 – Florence 2010 – Fabrizio Milo
  • 50. Error Handling If(cudaMalloc( pointer, size ) != cudaSuccess){ handle_error() } PyCon 4 – Florence 2010 – Fabrizio Milo
  • 51. And soon it becomes … If(cudaMalloc( pointer, size ) != cudaSuccess){ handle_error() } if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {} If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){ handle_error() } If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { } PyCon 4 – Florence 2010 – Fabrizio Milo
  • 52. And soon it becomes … If(cudaMalloc( pointer, size ) != cudaSuccess){ handle_error() If(cudaMalloc( pointer, size ) != cudaSuccess){ } handle_error() } if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {} if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {} If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){ handle_error() If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){ } handle_error() } If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { } If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { } If(cudaMalloc( pointer, size ) != cudaSuccess){ handle_error() If(cudaMalloc( pointer, size ) != cudaSuccess){ } handle_error() } if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {} if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {} If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){ handle_error() If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){ } handle_error() } If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { } If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { } If(cudaMalloc( pointer, size ) != cudaSuccess){ handle_error() If(cudaMalloc( pointer, size ) != cudaSuccess){ } handle_error() } if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {} if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {} If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){ handle_error() If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){ } handle_error() } If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { } If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { } PyCon 4 – Florence 2010 – Fabrizio Milo
  • 53. PyCon 4 – Florence 2010 – Fabrizio Milo
  • 54. 1.β€―Why a GPU ? 2.β€―How does It works ? 3.β€―How do I Program it ? 4.β€―Can I Use Python ? PyCon 4 – Florence 2010 – Fabrizio Milo
  • 55. + & ANDREAS KLOCKNER = PYCUDA PyCon 4 – Florence 2010 – Fabrizio Milo
  • 56. PyCuda Philosopy Provide Complete Access PyCon 4 – Florence 2010 – Fabrizio Milo
  • 57. PyCuda Philosopy AutoMatically Manage Resources PyCon 4 – Florence 2010 – Fabrizio Milo
  • 58. PyCuda Philosopy Check and Report Errors PyCon 4 – Florence 2010 – Fabrizio Milo
  • 59. PyCuda Philosopy Cross Platform PyCon 4 – Florence 2010 – Fabrizio Milo
  • 60. PyCuda Philosopy Allow Interactive Use PyCon 4 – Florence 2010 – Fabrizio Milo
  • 61. PyCuda Philosopy NumPy Integration PyCon 4 – Florence 2010 – Fabrizio Milo
  • 62. NUMPY - ARRAY PyCon 4 – Florence 2010 – Fabrizio Milo
  • 63. 1 1 1 1 1 1 0 99 import numpy my_array = numpy.array([1,] * 100) PyCon 4 – Florence 2010 – Fabrizio Milo
  • 64. 1 1 1 0 1 1 import numpy my_array = numpy.array([1,] * 100) my_array[3] = 0 PyCon 4 – Florence 2010 – Fabrizio Milo
  • 65. PyCuda: Workflow PyCon 4 – Florence 2010 – Fabrizio Milo
  • 66. PyCuda: Workflow PyCon 4 – Florence 2010 – Fabrizio Milo
  • 67. PyCuda: Workflow PyCon 4 – Florence 2010 – Fabrizio Milo
  • 68. Memory Allocation cuda.mem_alloc( size_bytes ) PyCon 4 – Florence 2010 – Fabrizio Milo
  • 69. Memory Copy gpu_mem = cuda.mem_alloc( size_bytes ) cuda.memcpy_htod( gpu_mem, cpu_mem ) PyCon 4 – Florence 2010 – Fabrizio Milo
  • 70. Kernel gpu_mem = cuda.mem_alloc( size_bytes ) cuda.memcpy_htod( gpu_mem, cpu_mem ) SourceModule(β€œβ€β€ __global__ void multiply_them( float *dest, float *a, float *b ) { const int i = threadIdx.x; dest[i] = a[i] * b[i]; }”””) PyCon 4 – Florence 2010 – Fabrizio Milo
  • 71. Kernel Launch mod = SourceModule(β€œβ€β€ __global__ void multiply_them( float *dest, float *a, float *b ) { const int i = threadIdx.x; dest[i] = a[i] * b[i]; }”””) multiply_them = mod.get_function(β€œmultiply_them”) multiply_them ( *args, block=(30, 64, 1)) PyCon 4 – Florence 2010 – Fabrizio Milo
  • 72. PyCon 4 – Florence 2010 – Fabrizio Milo
  • 73. PyCon 4 – Florence 2010 – Fabrizio Milo
  • 74. PyCon 4 – Florence 2010 – Fabrizio Milo
  • 75. Hello Gpu DEMO PyCon 4 – Florence 2010 – Fabrizio Milo
  • 76. GPUARRAY PyCon 4 – Florence 2010 – Fabrizio Milo
  • 77. gpuarray PyCon 4 – Florence 2010 – Fabrizio Milo
  • 78. PyCuda: GpuArray gpuarray.to_gpu(numpy array) numpy array = gpuarray.get() PyCon 4 – Florence 2010 – Fabrizio Milo
  • 79. PyCuda: GpuArray gpuarray.to_gpu(numpy array) numpy array = gpuarray.get() +, -, !, /, fill, sin, exp, rand, basic indexing, norm, inner product … PyCon 4 – Florence 2010 – Fabrizio Milo
  • 80. PyCuda: GpuArray: ElementWise from pycuda.elementwise import ElementwiseKernel PyCon 4 – Florence 2010 – Fabrizio Milo
  • 81. PyCuda: GpuArray: ElementWise from pycuda.elementwise import ElementwiseKernel lincomb = ElementwiseKernel( ” float a , float !x , float b , float !y , float !z”, ”z [ i ] = a !x[ i ] + b!y[i ] ” ) PyCon 4 – Florence 2010 – Fabrizio Milo
  • 82. PyCuda: GpuArray: ElementWise from pycuda.elementwise import ElementwiseKernel lin comb = ElementwiseKernel( ” float a , float !x , float b , float !y , float !z”, ”z [ i ] = a !x[ i ] + b!y[i ] ” ) c gpu = gpuarray. empty like (a gpu) lincomb (5, a gpu, 6, b gpu, c gpu) assert la . norm((c gpu ! (5!a gpu+6!b gpu)).get()) < 1e!5 PyCon 4 – Florence 2010 – Fabrizio Milo
  • 83. Meta-Programming __kernel_template__ = β€œβ€β€ __global__ void kernel( args ) { for (int i=0; i={{ iterations }}; i++){ {{operations}} } }””” See for example jinja2 PyCon 4 – Florence 2010 – Fabrizio Milo
  • 84. Meta-Programming PyCon 4 – Florence 2010 – Fabrizio Milo
  • 85. Meta-Programming Generate Source ! PyCon 4 – Florence 2010 – Fabrizio Milo
  • 86. Performances ? PyCon 4 – Florence 2010 – Fabrizio Milo
  • 87. mandelbrot DEMO PyCon 4 – Florence 2010 – Fabrizio Milo
  • 88. PyCuda: Documentation PyCon 4 – Florence 2010 – Fabrizio Milo
  • 89. PyCuda WebSite: http://mathema.tician.de/software/ pycuda License: X Consortium License (no warranty, free for all use) Dependencies: Python 2.4+, numpy, Boost PyCon 4 – Florence 2010 – Fabrizio Milo
  • 90. In the Future … OPENCL PyCon 4 – Florence 2010 – Fabrizio Milo
  • 91. THANK YOU & HAVE FUN ! PyCon 4 – Florence 2010 – Fabrizio Milo
  • 92. ? PyCon 4 – Florence 2010 – Fabrizio Milo