SlideShare a Scribd company logo
Numba: An array-oriented
   Python compiler
  SIAM Conference on Computational
       Science and Engineering
          Travis E. Oliphant

           February 25, 2012
Big Picture
   Empower domain experts with
high-level tools that exploit modern
             hard-ware


                           ?

          Array Oriented Computing
Software Stack Future?
         Plateaus of Code re-use + DSLs
   SQL                                R
            TDPL                                Matlab


                    Python


             OBJC      Julia     C
  FORTRAN                                 C++



                     LLVM
Motivation
• Python is great for rapid development and
  high-level thinking-in-code
• It is slow for interior loops because lack of
  type information leads to a lot of indirection
  and “extra” code.
Motivation
• NumPy users have had a lot of type
  information for a long time --- but only
  currently have one-size fits all pre-
  compiled, vectorized loops.
• Idea is to use this type information to
  allow compilation of arbitrary
  expressions involving NumPy arrays
Current approaches
• Cython
• Weave
• Write fast-code in C/C++/Fortran and “wrap” with
 - SWIG or Cython
 - f2py or fwrap
 - hand-written

 Numba philosophy : don’t wrap or rewrite
 --- just decorate
Simple API	

• jit --- provide type information (fastest to call)
• autojit --- detects input types, infers output, generates
 code if needed, and dispatches (a little more
 expensive to call)

     @jit('void(double[:,:], double, double)')
     #@autojit
     def numba_update(u, dx2, dy2):
         nx, ny = u.shape
         for i in xrange(1,nx-1):
             for j in xrange(1, ny-1):
                 u[i,j] = ((u[i+1,j] + u[i-1,j]) * dy2 +
                           (u[i,j+1] + u[i,j-1]) * dx2) / (2*(dx2+dy2))
~150x speed-up       Real-time image
                 processing in Python (50
                     fps Mandelbrot)
Image Processing                                  ~1500x speed-up




   @jit('void(f8[:,:],f8[:,:],f8[:,:])')
   def filter(image, filt, output):
       M, N = image.shape
       m, n = filt.shape
       for i in range(m//2, M-m//2):
           for j in range(n//2, N-n//2):
               result = 0.0
               for k in range(m):
                   for l in range(n):
                       result += image[i+k-m//2,j+l-n//2]*filt[k, l]
               output[i,j] = result
NumPy + Mamba = Numba
 Python Function                         Machine Code


                       LLVM-PY

                   LLVM Library
       ISPC   OpenCL     OpenMP   CUDA      CLANG


    Intel     AMD        Nvidia     Apple       ARM
Example
@jit(‘f8(f8)’)
def sinc(x):
    if x==0.0:
        return 1.0
    else:
        return sin(x*pi)/(pi*x)




         Numba
Compiler Overview

                  C++
                                      x86

                   C
                           LLVM IR    ARM
                Fortran

                                      PTX
                 Python




     Numba turns Python into a “compiled
      language” (but much more flexible)
Compile NumPy array expressions

  import numbapro
  from numba import autojit

  @autojit
  def formula(a, b, c):
      a[1:,1:] = a[1:,1:] + b[1:,:-1] + c[1:,:-1]

  @autojit
  def express(m1, m2):
      m2[1:-1:2,0,...,::2] = (m1[1:-1:2,...,::2] *
                               m1[-2:1:-2,...,::2])
      return m2
Fast vectorize
 NumPy’s ufuncs take “kernels” and
apply the kernel element-by-element
          over entire arrays
                                      Write kernels in
from numbapro import vectorize
from math import sin
                                         Python!
@vectorize([‘f8(f8)’, ‘f4(f4)’])
def sinc(x):
   if x==0.0:
        return 1.0
    else:
        return sin(x*pi)/(pi*x)
Updated Laplace Example
    https://github.com/teoliphant/speed.git
      Version        Time         Speed Up
       NumPy          3.19            1.0
       Numba          2.32           1.38
     Vect. Numba      2.33           1.37
       Cython         2.38           1.34
        Weave         2.47           1.29
      Numexpr         2.62           1.22
    Fortran Loops     2.30           1.39
     Vect. Fortran    1.50           2.13
Many Advanced Features
• Extension classes (jit a class)
• Struct support (NumPy arrays can be structs)
• SSA --- can refer to local variables as different types
• Typed lists and typed dictionaries coming
• pointer support
• calling ctypes and CFFI functions natively
• pycc (create stand-alone dynamic library and executable)
• pycc --python (create static extension module for Python)
Ufuncs


                Generalized
                 UFuncs
                                                          Python
                                                         Function
                 Window
                 Kernel
                  Funcs

                 Function-
                                                                    Uses of Numba




                   based
                 Indexing


                 Memory
                  Filters
                                                 Numba




NumPy Runtime
                I/O Filters



                Reduction
                 Filters


                Computed
                Columns
                              function pointer
Status
• Main team sponsored by Continuum Analytics and
    composed of:
    - Travis Oliphant (NumPy, SciPy)
    - Jon Riehl (Mython, PyFront, Basil, ...)
    - Mark Florrison (minivect, Cython)
    - Siu Kwan Lam (pymothoa, llvmpy)
•   Rapid progress this year
•   Version 0.6 released first of February
•   Version 0.7 next week                             numba.pydata.org
•   Version 0.8 first of April
•   Stable API (jit, autojit) easy to use
•   Full Python support by 1.0 at end of summer
•   Should be able to write equivalent of NumPy and
    SciPy with Numba
Introducing NumbaPro

                              • Create parallel-for loops
                              • Parallel execution of
                                  ufuncs
  fast development and fast   •   Run ufuncs on the GPU
  execution!                  •   Write CUDA directly in
                                  Python!
Python and NumPy compiled to
                             •    Free for Academics
     Parallel Architectures
     (GPUs and multi-core
           machines)
Create parallel-for loops

 import numbapro # import first to make prange available
 from numba import autojit, prange

 @autojit
 def parallel_sum2d(a):
     sum = 0.0
     for i in prange(a.shape[0]):
          for j in range(a.shape[1]):
              sum += a[i,j]
Ufuncs in parallel (multi-core or GPU)
   from numbapro import vectorize
   from math import sin

   @vectorize([‘f8(f8)’, ‘f4(f4)’], target=‘gpu’)
   def sinc(x):
      if x==0.0:
           return 1.0
       else:
           return sin(x*pi)/(pi*x)

   @vectorize([‘f8(f8)’, ‘f4(f4)’], target=‘parallel’)
   def sinc2(x):
      if x==0.0:
           return 1.0
       else:
           return sin(x*pi)/(pi*x)
Introducing CUDA-Python
from numbapro import cuda
from numba import autojit

@autojit(target=‘gpu’)
def array_scale(src, dst, scale):
    tid = cuda.threadIdx.x
    blkid = cuda.blockIdx.x
    blkdim = cuda.blockDim.x              CUDA Development
    i = tid + blkid * blkdim

    if i >= n:                            using Python syntax!
        return

    dst[i] = src[i] * scale

src = np.arange(N, dtype=np.float)
dst = np.empty_like(src)

array_scale[grid, block](src, dst, 5.0)
Example: Matrix multiply
@cuda.jit(argtypes=[f4[:,:], f4[:,:], f4[:,:]])
def cu_square_matrix_mul(A, B, C):
    sA = cuda.shared.array(shape=(tpb, tpb),
dtype=f4)
    sB = cuda.shared.array(shape=(tpb, tpb),
dtype=f4)                                           bpg = 50
                                                    tpb = 32
    tx   =    cuda.threadIdx.x                      n = bpg * tpb
    ty   =    cuda.threadIdx.y
    bx   =    cuda.blockIdx.x                       A = np.array(np.random.random((n, n)),
    by   =    cuda.blockIdx.y                       dtype=np.float32)
    bw   =    cuda.blockDim.x
    bh   =    cuda.blockDim.y
                                                    B = np.array(np.random.random((n, n)),
                                                    dtype=np.float32)
    x = tx + bx * bw                                C = np.empty_like(A)
    y = ty + by * bh
                                                    stream = cuda.stream()
    acc = 0.                                        with stream.auto_synchronize():
    for i in range(bpg):                                dA = cuda.to_device(A, stream)
        if x < n and y < n:                             dB = cuda.to_device(B, stream)
             sA[ty, tx] = A[y, tx + i * tpb]
             sB[ty, tx] = B[ty + i * tpb, x]
                                                        dC = cuda.to_device(C, stream)
                                                        cu_square_matrix_mul[(bpg, bpg),
             cuda.syncthreads()                     (tpb, tpb), stream](dA, dB, dC)
                                                        dC.to_host(stream)
             if x < n and y < n:
                 for j in range(tpb):
                     acc += sA[ty, j] * sB[j, tx]

             cuda.syncthreads()

    if x < n and y < n:
        C[y, x] = acc
Early Performance Results
     core     GeForce GTX
     i7       560 Ti
                            Already about 6x
                              faster on the
                                  GPU.
Example: Black-Scholes
@cuda.jit(argtypes=(double,), restype=double, device=True, inline=True)
def cnd_cuda(d):
                                                                      @cuda.jit(argtypes=(double[:], double[:],
    A1 = 0.31938153
                                                                      double[:],
    A2 = -0.356563782
                                                                                           double[:], double[:],
    A3 = 1.781477937
                                                                      double, double))
    A4 = -1.821255978
                                                                      def black_scholes_cuda(callResult, putResult,
    A5 = 1.330274429
                                                                      S, X, T, R, V):
    RSQRT2PI = 0.39894228040143267793994605993438
                                                                          # S = stockPrice
    K = 1.0 / (1.0 + 0.2316419 * math.fabs(d))
                                                                          # X = optionStrike
    ret_val = (RSQRT2PI * math.exp(-0.5 * d * d) *
                                                                          # T = optionYears
                (K * (A1 + K * (A2 + K * (A3 + K * (A4 + K * A5))))))
                                                                          # R = Riskfree
    if d > 0:
                                                                          # V = Volatility
         ret_val = 1.0 - ret_val
                                                                          i = cuda.threadIdx.x + cuda.blockIdx.x *
    return ret_val
                                                                      cuda.blockDim.x
 blockdim = 1024, 1                                                       if i >= S.shape[0]:
 griddim = int(math.ceil(float(OPT_N)/blockdim[0])), 1                         return
 stream = cuda.stream()                                                   sqrtT = math.sqrt(T[i])
 d_callResult = cuda.to_device(callResultNumbapro,                        d1 = (math.log(S[i] / X[i]) +
 stream)                                                                          (R + 0.5 * V * V) * T[i]) / (V *
 d_putResult = cuda.to_device(putResultNumbapro,                      sqrtT)
 stream)                                                                  d2 = d1 - V * sqrtT
 d_stockPrice = cuda.to_device(stockPrice, stream)                        cndd1 = cnd_cuda(d1)
 d_optionStrike = cuda.to_device(optionStrike, stream)                    cndd2 = cnd_cuda(d2)
 d_optionYears = cuda.to_device(optionYears, stream)
 for i in range(iterations):                                              expRT = math.exp((-1. * R) * T[i])
     black_scholes_cuda[griddim, blockdim, stream](                       callResult[i] = (S[i] * cndd1 - X[i] *
           d_callResult, d_putResult, d_stockPrice,                   expRT * cndd2)
 d_optionStrike,                                                          putResult[i] = (X[i] * expRT * (1.0 -
           d_optionYears, RISKFREE, VOLATILITY)                       cndd2) -
     d_callResult.to_host(stream)                                                                  S[i] * (1.0 -
     d_putResult.to_host(stream)                                                                 cndd1))
     stream.synchronize()
Black-Scholes: Results
       core         GeForce GTX
       i7           560 Ti
                                     Already
                                    about 6x
                                  faster on the
                                      GPU
NumFOCUS
       www.numfocus.org
      501(c)3 Public Charity
NumFOCUS Mission


• Sponsor development of high-level languages and
  libraries for science
• Foster teaching of array-oriented and higher-order
  computational approaches and applied
  computational science
• Promote the use of open code in science and
  encourage reproducible and accessible research
NumFOCUS Activities

 • Sponsor sprints and conferences
 • Provide scholarships and grants
 • Provide bounties and prizes for code development
 • Pay for freely-available documentation and basic
   course development
 • Equipment grants
 • Sponsor BootCamps
 • Raise funds from industries using open source
   high-level languages

More Related Content

What's hot

Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Edureka!
 
Numeric Data types in Python
Numeric Data types in PythonNumeric Data types in Python
Numeric Data types in Python
jyostna bodapati
 
Data types in python
Data types in pythonData types in python
Data types in python
Learnbay Datascience
 
Archiving in linux tar
Archiving in linux tarArchiving in linux tar
Archiving in linux tar
InfoExcavator
 
Arrays in python
Arrays in pythonArrays in python
Arrays in python
Lifna C.S
 
02. chapter 3 lexical analysis
02. chapter 3   lexical analysis02. chapter 3   lexical analysis
02. chapter 3 lexical analysisraosir123
 
Arrays in python
Arrays in pythonArrays in python
Arrays in python
moazamali28
 
Modules and packages in python
Modules and packages in pythonModules and packages in python
Modules and packages in python
TMARAGATHAM
 
Stack and Queue
Stack and Queue Stack and Queue
Stack and Queue
Apurbo Datta
 
移植FreeRTOS 之嵌入式軟體研究與開發
移植FreeRTOS 之嵌入式軟體研究與開發移植FreeRTOS 之嵌入式軟體研究與開發
移植FreeRTOS 之嵌入式軟體研究與開發
艾鍗科技
 
List in Python
List in PythonList in Python
List in Python
Sharath Ankrajegowda
 
Python Tutorial
Python TutorialPython Tutorial
Python Tutorial
AkramWaseem
 
Data Structures- Part5 recursion
Data Structures- Part5 recursionData Structures- Part5 recursion
Data Structures- Part5 recursion
Abdullah Al-hazmy
 
Introduction to NumPy
Introduction to NumPyIntroduction to NumPy
Introduction to NumPy
Huy Nguyen
 
Implementation of c string functions
Implementation of c string functionsImplementation of c string functions
Implementation of c string functions
mohamed sikander
 
Python Basics
Python BasicsPython Basics
Python Basics
primeteacher32
 
Stacks & Queues By Ms. Niti Arora
Stacks & Queues By Ms. Niti AroraStacks & Queues By Ms. Niti Arora
Stacks & Queues By Ms. Niti Arora
kulachihansraj
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
Brendan Gregg
 
Complete C++ programming Language Course
Complete C++ programming Language CourseComplete C++ programming Language Course
Complete C++ programming Language Course
Vivek chan
 
Java Notes by C. Sreedhar, GPREC
Java Notes by C. Sreedhar, GPRECJava Notes by C. Sreedhar, GPREC
Java Notes by C. Sreedhar, GPREC
Sreedhar Chowdam
 

What's hot (20)

Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
 
Numeric Data types in Python
Numeric Data types in PythonNumeric Data types in Python
Numeric Data types in Python
 
Data types in python
Data types in pythonData types in python
Data types in python
 
Archiving in linux tar
Archiving in linux tarArchiving in linux tar
Archiving in linux tar
 
Arrays in python
Arrays in pythonArrays in python
Arrays in python
 
02. chapter 3 lexical analysis
02. chapter 3   lexical analysis02. chapter 3   lexical analysis
02. chapter 3 lexical analysis
 
Arrays in python
Arrays in pythonArrays in python
Arrays in python
 
Modules and packages in python
Modules and packages in pythonModules and packages in python
Modules and packages in python
 
Stack and Queue
Stack and Queue Stack and Queue
Stack and Queue
 
移植FreeRTOS 之嵌入式軟體研究與開發
移植FreeRTOS 之嵌入式軟體研究與開發移植FreeRTOS 之嵌入式軟體研究與開發
移植FreeRTOS 之嵌入式軟體研究與開發
 
List in Python
List in PythonList in Python
List in Python
 
Python Tutorial
Python TutorialPython Tutorial
Python Tutorial
 
Data Structures- Part5 recursion
Data Structures- Part5 recursionData Structures- Part5 recursion
Data Structures- Part5 recursion
 
Introduction to NumPy
Introduction to NumPyIntroduction to NumPy
Introduction to NumPy
 
Implementation of c string functions
Implementation of c string functionsImplementation of c string functions
Implementation of c string functions
 
Python Basics
Python BasicsPython Basics
Python Basics
 
Stacks & Queues By Ms. Niti Arora
Stacks & Queues By Ms. Niti AroraStacks & Queues By Ms. Niti Arora
Stacks & Queues By Ms. Niti Arora
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Complete C++ programming Language Course
Complete C++ programming Language CourseComplete C++ programming Language Course
Complete C++ programming Language Course
 
Java Notes by C. Sreedhar, GPREC
Java Notes by C. Sreedhar, GPRECJava Notes by C. Sreedhar, GPREC
Java Notes by C. Sreedhar, GPREC
 

Viewers also liked

Python Seminar PPT
Python Seminar PPTPython Seminar PPT
Python Seminar PPT
Shivam Gupta
 
GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And Python
Jen Aman
 
Numba Overview
Numba OverviewNumba Overview
Numba Overview
stan_seibert
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
Travis Oliphant
 
Scala.io
Scala.ioScala.io
Scala.io
Steve Gury
 
The State of High-Performance Computing in the Open-Source R Ecosystem
The State of High-Performance Computing in the Open-Source R EcosystemThe State of High-Performance Computing in the Open-Source R Ecosystem
The State of High-Performance Computing in the Open-Source R Ecosystem
Intel® Software
 
Buzzwords Numba Presentation
Buzzwords Numba PresentationBuzzwords Numba Presentation
Buzzwords Numba Presentationkammeyer
 
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Intel® Software
 
Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...
PyData
 
Numba
NumbaNumba

Viewers also liked (11)

Python Seminar PPT
Python Seminar PPTPython Seminar PPT
Python Seminar PPT
 
GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And Python
 
PyData Paris 2015 - Track 3.3 Antoine Pitrou
PyData Paris 2015 - Track 3.3 Antoine PitrouPyData Paris 2015 - Track 3.3 Antoine Pitrou
PyData Paris 2015 - Track 3.3 Antoine Pitrou
 
Numba Overview
Numba OverviewNumba Overview
Numba Overview
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
 
Scala.io
Scala.ioScala.io
Scala.io
 
The State of High-Performance Computing in the Open-Source R Ecosystem
The State of High-Performance Computing in the Open-Source R EcosystemThe State of High-Performance Computing in the Open-Source R Ecosystem
The State of High-Performance Computing in the Open-Source R Ecosystem
 
Buzzwords Numba Presentation
Buzzwords Numba PresentationBuzzwords Numba Presentation
Buzzwords Numba Presentation
 
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
 
Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...
 
Numba
NumbaNumba
Numba
 

Similar to Numba: Array-oriented Python Compiler for NumPy

Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUs
Travis Oliphant
 
Swift for tensorflow
Swift for tensorflowSwift for tensorflow
Swift for tensorflow
규영 허
 
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
PyData
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
Qiangning Hong
 
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel write Python code, get Fortran ...
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel  write Python code, get Fortran ...SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel  write Python code, get Fortran ...
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel write Python code, get Fortran ...
South Tyrol Free Software Conference
 
PyCon Estonia 2019
PyCon Estonia 2019PyCon Estonia 2019
PyCon Estonia 2019
Travis Oliphant
 
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
Yu-Hsun (lymanblue) Lin
 
Python For Scientists
Python For ScientistsPython For Scientists
Python For Scientists
aeberspaecher
 
Machine learning with py torch
Machine learning with py torchMachine learning with py torch
Machine learning with py torch
Riza Fahmi
 
PyTorch Tutorial for NTU Machine Learing Course 2017
PyTorch Tutorial for NTU Machine Learing Course 2017PyTorch Tutorial for NTU Machine Learing Course 2017
PyTorch Tutorial for NTU Machine Learing Course 2017
Yu-Hsun (lymanblue) Lin
 
Scientific visualization with_gr
Scientific visualization with_grScientific visualization with_gr
Scientific visualization with_grJosef Heinen
 
Tensorflow in practice by Engineer - donghwi cha
Tensorflow in practice by Engineer - donghwi chaTensorflow in practice by Engineer - donghwi cha
Tensorflow in practice by Engineer - donghwi cha
Donghwi Cha
 
Travis Oliphant "Python for Speed, Scale, and Science"
Travis Oliphant "Python for Speed, Scale, and Science"Travis Oliphant "Python for Speed, Scale, and Science"
Travis Oliphant "Python for Speed, Scale, and Science"
Fwdays
 
Cluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CCluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in C
Steffen Wenz
 
What&rsquo;s new in Visual C++
What&rsquo;s new in Visual C++What&rsquo;s new in Visual C++
What&rsquo;s new in Visual C++
Microsoft
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance Python
Ian Ozsvald
 
Overview of Python - Bsides Detroit 2012
Overview of Python - Bsides Detroit 2012Overview of Python - Bsides Detroit 2012
Overview of Python - Bsides Detroit 2012
Tazdrumm3r
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Databricks
 
Simple APIs and innovative documentation
Simple APIs and innovative documentationSimple APIs and innovative documentation
Simple APIs and innovative documentation
PyDataParis
 
JIT compilation for CPython
JIT compilation for CPythonJIT compilation for CPython
JIT compilation for CPython
delimitry
 

Similar to Numba: Array-oriented Python Compiler for NumPy (20)

Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUs
 
Swift for tensorflow
Swift for tensorflowSwift for tensorflow
Swift for tensorflow
 
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
 
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel write Python code, get Fortran ...
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel  write Python code, get Fortran ...SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel  write Python code, get Fortran ...
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel write Python code, get Fortran ...
 
PyCon Estonia 2019
PyCon Estonia 2019PyCon Estonia 2019
PyCon Estonia 2019
 
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
 
Python For Scientists
Python For ScientistsPython For Scientists
Python For Scientists
 
Machine learning with py torch
Machine learning with py torchMachine learning with py torch
Machine learning with py torch
 
PyTorch Tutorial for NTU Machine Learing Course 2017
PyTorch Tutorial for NTU Machine Learing Course 2017PyTorch Tutorial for NTU Machine Learing Course 2017
PyTorch Tutorial for NTU Machine Learing Course 2017
 
Scientific visualization with_gr
Scientific visualization with_grScientific visualization with_gr
Scientific visualization with_gr
 
Tensorflow in practice by Engineer - donghwi cha
Tensorflow in practice by Engineer - donghwi chaTensorflow in practice by Engineer - donghwi cha
Tensorflow in practice by Engineer - donghwi cha
 
Travis Oliphant "Python for Speed, Scale, and Science"
Travis Oliphant "Python for Speed, Scale, and Science"Travis Oliphant "Python for Speed, Scale, and Science"
Travis Oliphant "Python for Speed, Scale, and Science"
 
Cluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CCluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in C
 
What&rsquo;s new in Visual C++
What&rsquo;s new in Visual C++What&rsquo;s new in Visual C++
What&rsquo;s new in Visual C++
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance Python
 
Overview of Python - Bsides Detroit 2012
Overview of Python - Bsides Detroit 2012Overview of Python - Bsides Detroit 2012
Overview of Python - Bsides Detroit 2012
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
 
Simple APIs and innovative documentation
Simple APIs and innovative documentationSimple APIs and innovative documentation
Simple APIs and innovative documentation
 
JIT compilation for CPython
JIT compilation for CPythonJIT compilation for CPython
JIT compilation for CPython
 

More from Travis Oliphant

Array computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataArray computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyData
Travis Oliphant
 
SciPy Latin America 2019
SciPy Latin America 2019SciPy Latin America 2019
SciPy Latin America 2019
Travis Oliphant
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019
Travis Oliphant
 
Standardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft PresentationStandardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft Presentation
Travis Oliphant
 
PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona Keynote
Travis Oliphant
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with Anaconda
Travis Oliphant
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
Travis Oliphant
 
Scale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataScale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyData
Travis Oliphant
 
Python as the Zen of Data Science
Python as the Zen of Data SciencePython as the Zen of Data Science
Python as the Zen of Data Science
Travis Oliphant
 
Anaconda and PyData Solutions
Anaconda and PyData SolutionsAnaconda and PyData Solutions
Anaconda and PyData Solutions
Travis Oliphant
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
Travis Oliphant
 
Bids talk 9.18
Bids talk 9.18Bids talk 9.18
Bids talk 9.18
Travis Oliphant
 
Effectively using Open Source with conda
Effectively using Open Source with condaEffectively using Open Source with conda
Effectively using Open Source with conda
Travis Oliphant
 
London level39
London level39London level39
London level39
Travis Oliphant
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
Travis Oliphant
 
Blaze: a large-scale, array-oriented infrastructure for Python
Blaze: a large-scale, array-oriented infrastructure for PythonBlaze: a large-scale, array-oriented infrastructure for Python
Blaze: a large-scale, array-oriented infrastructure for Python
Travis Oliphant
 
PyData Introduction
PyData IntroductionPyData Introduction
PyData Introduction
Travis Oliphant
 

More from Travis Oliphant (18)

Array computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataArray computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyData
 
SciPy Latin America 2019
SciPy Latin America 2019SciPy Latin America 2019
SciPy Latin America 2019
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019
 
Standardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft PresentationStandardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft Presentation
 
PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona Keynote
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with Anaconda
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
Scale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataScale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyData
 
Python as the Zen of Data Science
Python as the Zen of Data SciencePython as the Zen of Data Science
Python as the Zen of Data Science
 
Anaconda and PyData Solutions
Anaconda and PyData SolutionsAnaconda and PyData Solutions
Anaconda and PyData Solutions
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Bids talk 9.18
Bids talk 9.18Bids talk 9.18
Bids talk 9.18
 
Effectively using Open Source with conda
Effectively using Open Source with condaEffectively using Open Source with conda
Effectively using Open Source with conda
 
London level39
London level39London level39
London level39
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
Blaze: a large-scale, array-oriented infrastructure for Python
Blaze: a large-scale, array-oriented infrastructure for PythonBlaze: a large-scale, array-oriented infrastructure for Python
Blaze: a large-scale, array-oriented infrastructure for Python
 
Numba lightning
Numba lightningNumba lightning
Numba lightning
 
PyData Introduction
PyData IntroductionPyData Introduction
PyData Introduction
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 

Numba: Array-oriented Python Compiler for NumPy

  • 1. Numba: An array-oriented Python compiler SIAM Conference on Computational Science and Engineering Travis E. Oliphant February 25, 2012
  • 2. Big Picture Empower domain experts with high-level tools that exploit modern hard-ware ? Array Oriented Computing
  • 3. Software Stack Future? Plateaus of Code re-use + DSLs SQL R TDPL Matlab Python OBJC Julia C FORTRAN C++ LLVM
  • 4. Motivation • Python is great for rapid development and high-level thinking-in-code • It is slow for interior loops because lack of type information leads to a lot of indirection and “extra” code.
  • 5. Motivation • NumPy users have had a lot of type information for a long time --- but only currently have one-size fits all pre- compiled, vectorized loops. • Idea is to use this type information to allow compilation of arbitrary expressions involving NumPy arrays
  • 6. Current approaches • Cython • Weave • Write fast-code in C/C++/Fortran and “wrap” with - SWIG or Cython - f2py or fwrap - hand-written Numba philosophy : don’t wrap or rewrite --- just decorate
  • 7. Simple API • jit --- provide type information (fastest to call) • autojit --- detects input types, infers output, generates code if needed, and dispatches (a little more expensive to call) @jit('void(double[:,:], double, double)') #@autojit def numba_update(u, dx2, dy2): nx, ny = u.shape for i in xrange(1,nx-1): for j in xrange(1, ny-1): u[i,j] = ((u[i+1,j] + u[i-1,j]) * dy2 + (u[i,j+1] + u[i,j-1]) * dx2) / (2*(dx2+dy2))
  • 8. ~150x speed-up Real-time image processing in Python (50 fps Mandelbrot)
  • 9. Image Processing ~1500x speed-up @jit('void(f8[:,:],f8[:,:],f8[:,:])') def filter(image, filt, output): M, N = image.shape m, n = filt.shape for i in range(m//2, M-m//2): for j in range(n//2, N-n//2): result = 0.0 for k in range(m): for l in range(n): result += image[i+k-m//2,j+l-n//2]*filt[k, l] output[i,j] = result
  • 10. NumPy + Mamba = Numba Python Function Machine Code LLVM-PY LLVM Library ISPC OpenCL OpenMP CUDA CLANG Intel AMD Nvidia Apple ARM
  • 11. Example @jit(‘f8(f8)’) def sinc(x): if x==0.0: return 1.0 else: return sin(x*pi)/(pi*x) Numba
  • 12. Compiler Overview C++ x86 C LLVM IR ARM Fortran PTX Python Numba turns Python into a “compiled language” (but much more flexible)
  • 13. Compile NumPy array expressions import numbapro from numba import autojit @autojit def formula(a, b, c): a[1:,1:] = a[1:,1:] + b[1:,:-1] + c[1:,:-1] @autojit def express(m1, m2): m2[1:-1:2,0,...,::2] = (m1[1:-1:2,...,::2] * m1[-2:1:-2,...,::2]) return m2
  • 14. Fast vectorize NumPy’s ufuncs take “kernels” and apply the kernel element-by-element over entire arrays Write kernels in from numbapro import vectorize from math import sin Python! @vectorize([‘f8(f8)’, ‘f4(f4)’]) def sinc(x): if x==0.0: return 1.0 else: return sin(x*pi)/(pi*x)
  • 15. Updated Laplace Example https://github.com/teoliphant/speed.git Version Time Speed Up NumPy 3.19 1.0 Numba 2.32 1.38 Vect. Numba 2.33 1.37 Cython 2.38 1.34 Weave 2.47 1.29 Numexpr 2.62 1.22 Fortran Loops 2.30 1.39 Vect. Fortran 1.50 2.13
  • 16. Many Advanced Features • Extension classes (jit a class) • Struct support (NumPy arrays can be structs) • SSA --- can refer to local variables as different types • Typed lists and typed dictionaries coming • pointer support • calling ctypes and CFFI functions natively • pycc (create stand-alone dynamic library and executable) • pycc --python (create static extension module for Python)
  • 17. Ufuncs Generalized UFuncs Python Function Window Kernel Funcs Function- Uses of Numba based Indexing Memory Filters Numba NumPy Runtime I/O Filters Reduction Filters Computed Columns function pointer
  • 18. Status • Main team sponsored by Continuum Analytics and composed of: - Travis Oliphant (NumPy, SciPy) - Jon Riehl (Mython, PyFront, Basil, ...) - Mark Florrison (minivect, Cython) - Siu Kwan Lam (pymothoa, llvmpy) • Rapid progress this year • Version 0.6 released first of February • Version 0.7 next week numba.pydata.org • Version 0.8 first of April • Stable API (jit, autojit) easy to use • Full Python support by 1.0 at end of summer • Should be able to write equivalent of NumPy and SciPy with Numba
  • 19. Introducing NumbaPro • Create parallel-for loops • Parallel execution of ufuncs fast development and fast • Run ufuncs on the GPU execution! • Write CUDA directly in Python! Python and NumPy compiled to • Free for Academics Parallel Architectures (GPUs and multi-core machines)
  • 20. Create parallel-for loops import numbapro # import first to make prange available from numba import autojit, prange @autojit def parallel_sum2d(a): sum = 0.0 for i in prange(a.shape[0]): for j in range(a.shape[1]): sum += a[i,j]
  • 21. Ufuncs in parallel (multi-core or GPU) from numbapro import vectorize from math import sin @vectorize([‘f8(f8)’, ‘f4(f4)’], target=‘gpu’) def sinc(x): if x==0.0: return 1.0 else: return sin(x*pi)/(pi*x) @vectorize([‘f8(f8)’, ‘f4(f4)’], target=‘parallel’) def sinc2(x): if x==0.0: return 1.0 else: return sin(x*pi)/(pi*x)
  • 22. Introducing CUDA-Python from numbapro import cuda from numba import autojit @autojit(target=‘gpu’) def array_scale(src, dst, scale): tid = cuda.threadIdx.x blkid = cuda.blockIdx.x blkdim = cuda.blockDim.x CUDA Development i = tid + blkid * blkdim if i >= n: using Python syntax! return dst[i] = src[i] * scale src = np.arange(N, dtype=np.float) dst = np.empty_like(src) array_scale[grid, block](src, dst, 5.0)
  • 23. Example: Matrix multiply @cuda.jit(argtypes=[f4[:,:], f4[:,:], f4[:,:]]) def cu_square_matrix_mul(A, B, C): sA = cuda.shared.array(shape=(tpb, tpb), dtype=f4) sB = cuda.shared.array(shape=(tpb, tpb), dtype=f4) bpg = 50 tpb = 32 tx = cuda.threadIdx.x n = bpg * tpb ty = cuda.threadIdx.y bx = cuda.blockIdx.x A = np.array(np.random.random((n, n)), by = cuda.blockIdx.y dtype=np.float32) bw = cuda.blockDim.x bh = cuda.blockDim.y B = np.array(np.random.random((n, n)), dtype=np.float32) x = tx + bx * bw C = np.empty_like(A) y = ty + by * bh stream = cuda.stream() acc = 0. with stream.auto_synchronize(): for i in range(bpg): dA = cuda.to_device(A, stream) if x < n and y < n: dB = cuda.to_device(B, stream) sA[ty, tx] = A[y, tx + i * tpb] sB[ty, tx] = B[ty + i * tpb, x] dC = cuda.to_device(C, stream) cu_square_matrix_mul[(bpg, bpg), cuda.syncthreads() (tpb, tpb), stream](dA, dB, dC) dC.to_host(stream) if x < n and y < n: for j in range(tpb): acc += sA[ty, j] * sB[j, tx] cuda.syncthreads() if x < n and y < n: C[y, x] = acc
  • 24. Early Performance Results core GeForce GTX i7 560 Ti Already about 6x faster on the GPU.
  • 25. Example: Black-Scholes @cuda.jit(argtypes=(double,), restype=double, device=True, inline=True) def cnd_cuda(d): @cuda.jit(argtypes=(double[:], double[:], A1 = 0.31938153 double[:], A2 = -0.356563782 double[:], double[:], A3 = 1.781477937 double, double)) A4 = -1.821255978 def black_scholes_cuda(callResult, putResult, A5 = 1.330274429 S, X, T, R, V): RSQRT2PI = 0.39894228040143267793994605993438 # S = stockPrice K = 1.0 / (1.0 + 0.2316419 * math.fabs(d)) # X = optionStrike ret_val = (RSQRT2PI * math.exp(-0.5 * d * d) * # T = optionYears (K * (A1 + K * (A2 + K * (A3 + K * (A4 + K * A5)))))) # R = Riskfree if d > 0: # V = Volatility ret_val = 1.0 - ret_val i = cuda.threadIdx.x + cuda.blockIdx.x * return ret_val cuda.blockDim.x blockdim = 1024, 1 if i >= S.shape[0]: griddim = int(math.ceil(float(OPT_N)/blockdim[0])), 1 return stream = cuda.stream() sqrtT = math.sqrt(T[i]) d_callResult = cuda.to_device(callResultNumbapro, d1 = (math.log(S[i] / X[i]) + stream) (R + 0.5 * V * V) * T[i]) / (V * d_putResult = cuda.to_device(putResultNumbapro, sqrtT) stream) d2 = d1 - V * sqrtT d_stockPrice = cuda.to_device(stockPrice, stream) cndd1 = cnd_cuda(d1) d_optionStrike = cuda.to_device(optionStrike, stream) cndd2 = cnd_cuda(d2) d_optionYears = cuda.to_device(optionYears, stream) for i in range(iterations): expRT = math.exp((-1. * R) * T[i]) black_scholes_cuda[griddim, blockdim, stream]( callResult[i] = (S[i] * cndd1 - X[i] * d_callResult, d_putResult, d_stockPrice, expRT * cndd2) d_optionStrike, putResult[i] = (X[i] * expRT * (1.0 - d_optionYears, RISKFREE, VOLATILITY) cndd2) - d_callResult.to_host(stream) S[i] * (1.0 - d_putResult.to_host(stream) cndd1)) stream.synchronize()
  • 26. Black-Scholes: Results core GeForce GTX i7 560 Ti Already about 6x faster on the GPU
  • 27. NumFOCUS www.numfocus.org 501(c)3 Public Charity
  • 28. NumFOCUS Mission • Sponsor development of high-level languages and libraries for science • Foster teaching of array-oriented and higher-order computational approaches and applied computational science • Promote the use of open code in science and encourage reproducible and accessible research
  • 29. NumFOCUS Activities • Sponsor sprints and conferences • Provide scholarships and grants • Provide bounties and prizes for code development • Pay for freely-available documentation and basic course development • Equipment grants • Sponsor BootCamps • Raise funds from industries using open source high-level languages