Compiling Python toNative Code for Speedand ScaleDavid KammeyerContinuum Analyticskammeyer@continuum.ioTuesday, June 4, 13
Continuum Background• Python for Big Data and Science• Founded by Travis Oliphant(Creator of NumPy) and PeterWang in 2012•...
EnterprisePythonScientificComputingData ProcessingData AnalysisVisualisationScalableComputing• Products• Training• Support•...
ProductsAnaconda: Easy to install Python distribution, including themost popular open-source scientific and mathematicallib...
Open Source ProjectsBlaze: High-performance Python library for modernvector computing, distributed and streaming dataBokeh...
Numba• Just-in-time, dynamic compiler for Python• Optimize data-parallel computations at call time,to take advantage of lo...
LLVMLLVM IRx86C++ARMPTXCFortranPython• Leverage LLVM ecosystem:• Optimization passes• Inter-op with other languages• Varie...
Simple API#@jit(void(double[:,:], double, double))@autojitdef numba_update(u, dx2, dy2):nx, ny = u.shapefor i in xrange(1,...
Example@jit(‘f8(f8)’)def sinc(x):if x==0.0:return 1.0else:return sin(x*pi)/(pi*x)NumbaTuesday, June 4, 13
Compile NumPy array expressionsfrom numba import autojit@autojitdef formula(a, b, c):a[1:,1:] = a[1:,1:] + b[1:,:-1] + c[1...
Fast vectorizeNumPy’s ufuncs take “kernels” andapply the kernel element-by-elementover entire arrays Write kernels inPytho...
Create parallel-for loops“prange” directive that spawns compiled tasksin threads (like Open-MP parallel-for pragma)import ...
Example: MandelbrotVectorizedfrom numbapro import vectorizesig = uint8(uint32, f4, f4, f4, f4, uint32, uint32,uint32)@vect...
Many More Advanced Features!• Extension classes (jit a class -- autojit coming soon!)• Struct support (NumPy arrays can be...
Availability•Core is Open Source•github.com/numba/numba•GPU Compiliation and Parallelizationavailable in Anaconda Accelera...
Questions?http://continuum.iokammeyer@continuum.ioTuesday, June 4, 13
Upcoming SlideShare
Loading in...5
×

Buzzwords Numba Presentation

653

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
653
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Buzzwords Numba Presentation

  1. 1. Compiling Python toNative Code for Speedand ScaleDavid KammeyerContinuum Analyticskammeyer@continuum.ioTuesday, June 4, 13
  2. 2. Continuum Background• Python for Big Data and Science• Founded by Travis Oliphant(Creator of NumPy) and PeterWang in 2012• 45 EmployeesTuesday, June 4, 13
  3. 3. EnterprisePythonScientificComputingData ProcessingData AnalysisVisualisationScalableComputing• Products• Training• Support• ConsultingAbout Continuum AnalyticsTuesday, June 4, 13
  4. 4. ProductsAnaconda: Easy to install Python distribution, including themost popular open-source scientific and mathematicallibraries. (Free!)Accelerate: Opens up the full capabilities of the GPU ormulti-core processor to Python.IOPro: fast loading of data from files, SQL, and NoSQLstores, improving performance and reducing memoryoverhead.Wakari: Browser-based Python and Linux environment forcollaborative data analysis, exploration, and visualization.(Small Instance is Free!)Tuesday, June 4, 13
  5. 5. Open Source ProjectsBlaze: High-performance Python library for modernvector computing, distributed and streaming dataBokeh: Interactive, grammar-based visualizationsystem for large datasetsNumba:Vectorizing Python compiler for multicoreand GPU, using LLVMTuesday, June 4, 13
  6. 6. Numba• Just-in-time, dynamic compiler for Python• Optimize data-parallel computations at call time,to take advantage of local hardware configuration• Compatible with NumPy, Blaze• Leverage LLVM ecosystem:• Optimization passes• Inter-op with other languages• Variety of backends (e.g. CUDA for GPU support)Tuesday, June 4, 13
  7. 7. LLVMLLVM IRx86C++ARMPTXCFortranPython• Leverage LLVM ecosystem:• Optimization passes• Inter-op with other languages• Variety of backends (e.g. CUDA for GPU support)Tuesday, June 4, 13
  8. 8. Simple API#@jit(void(double[:,:], double, double))@autojitdef numba_update(u, dx2, dy2):nx, ny = u.shapefor i in xrange(1,nx-1):for j in xrange(1, ny-1):u[i,j] = ((u[i+1,j] + u[i-1,j]) * dy2 +(u[i,j+1] + u[i,j-1]) * dx2) /(2*(dx2+dy2))Comment out one of jit or autojit (don’t use together)• jit --- provide type information (fastest to call at run-time)• autojit --- detects input types, infers output, generates codeif needed, and dispatches (a little more run-time calloverhead)Tuesday, June 4, 13
  9. 9. Example@jit(‘f8(f8)’)def sinc(x):if x==0.0:return 1.0else:return sin(x*pi)/(pi*x)NumbaTuesday, June 4, 13
  10. 10. Compile NumPy array expressionsfrom numba import autojit@autojitdef formula(a, b, c):a[1:,1:] = a[1:,1:] + b[1:,:-1] + c[1:,:-1]@autojitdef express(m1, m2):m2[1:-1:2,0,...,::2] = (m1[1:-1:2,...,::2]* m1[-2:1:-2,...,::2])return m2Tuesday, June 4, 13
  11. 11. Fast vectorizeNumPy’s ufuncs take “kernels” andapply the kernel element-by-elementover entire arrays Write kernels inPython!from numbapro import vectorizefrom math import sin@vectorize([‘f8(f8)’, ‘f4(f4)’])def sinc(x):if x==0.0:return 1.0else:return sin(x*pi)/(pi*x)Tuesday, June 4, 13
  12. 12. Create parallel-for loops“prange” directive that spawns compiled tasksin threads (like Open-MP parallel-for pragma)import numbaprofrom numba import autojit, prange@autojitdef parallel_sum2d(a):sum = 0.0for i in prange(a.shape[0]):for j in range(a.shape[1]):sum += a[i,j]Tuesday, June 4, 13
  13. 13. Example: MandelbrotVectorizedfrom numbapro import vectorizesig = uint8(uint32, f4, f4, f4, f4, uint32, uint32,uint32)@vectorize([sig], target=gpu)def mandel(tid, min_x, max_x, min_y, max_y, width,height, iters):pixel_size_x = (max_x - min_x) / widthpixel_size_y = (max_y - min_y) / heightx = tid % widthy = tid / widthreal = min_x + x * pixel_size_ximag = min_y + y * pixel_size_yc = complex(real, imag)z = 0.0jfor i in range(iters):z = z * z + cif (z.real * z.real + z.imag * z.imag) >= 4:return ireturn 255Kind Time Speed-upPython 263.6 1.0xCPU 2.639 100xGPU 0.1676 1573xTesla S2050Tuesday, June 4, 13
  14. 14. Many More Advanced Features!• Extension classes (jit a class -- autojit coming soon!)• Struct support (NumPy arrays can be structs)• SSA -- can refer to local variables as different types• Typed lists and typed dictionaries and sets comingsoon!• Calling ctypes and CFFI functions natively• pycc (create stand-alone dynamic library andexecutable)• pycc --python (create static extension module forPython)Tuesday, June 4, 13
  15. 15. Availability•Core is Open Source•github.com/numba/numba•GPU Compiliation and Parallelizationavailable in Anaconda Accelerate, €100.Tuesday, June 4, 13
  16. 16. Questions?http://continuum.iokammeyer@continuum.ioTuesday, June 4, 13

×