Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Buzzwords Numba Presentation


Published on

  • Be the first to comment

Buzzwords Numba Presentation

  1. 1. Compiling Python toNative Code for Speedand ScaleDavid KammeyerContinuum Analyticskammeyer@continuum.ioTuesday, June 4, 13
  2. 2. Continuum Background• Python for Big Data and Science• Founded by Travis Oliphant(Creator of NumPy) and PeterWang in 2012• 45 EmployeesTuesday, June 4, 13
  3. 3. EnterprisePythonScientificComputingData ProcessingData AnalysisVisualisationScalableComputing• Products• Training• Support• ConsultingAbout Continuum AnalyticsTuesday, June 4, 13
  4. 4. ProductsAnaconda: Easy to install Python distribution, including themost popular open-source scientific and mathematicallibraries. (Free!)Accelerate: Opens up the full capabilities of the GPU ormulti-core processor to Python.IOPro: fast loading of data from files, SQL, and NoSQLstores, improving performance and reducing memoryoverhead.Wakari: Browser-based Python and Linux environment forcollaborative data analysis, exploration, and visualization.(Small Instance is Free!)Tuesday, June 4, 13
  5. 5. Open Source ProjectsBlaze: High-performance Python library for modernvector computing, distributed and streaming dataBokeh: Interactive, grammar-based visualizationsystem for large datasetsNumba:Vectorizing Python compiler for multicoreand GPU, using LLVMTuesday, June 4, 13
  6. 6. Numba• Just-in-time, dynamic compiler for Python• Optimize data-parallel computations at call time,to take advantage of local hardware configuration• Compatible with NumPy, Blaze• Leverage LLVM ecosystem:• Optimization passes• Inter-op with other languages• Variety of backends (e.g. CUDA for GPU support)Tuesday, June 4, 13
  7. 7. LLVMLLVM IRx86C++ARMPTXCFortranPython• Leverage LLVM ecosystem:• Optimization passes• Inter-op with other languages• Variety of backends (e.g. CUDA for GPU support)Tuesday, June 4, 13
  8. 8. Simple API#@jit(void(double[:,:], double, double))@autojitdef numba_update(u, dx2, dy2):nx, ny = u.shapefor i in xrange(1,nx-1):for j in xrange(1, ny-1):u[i,j] = ((u[i+1,j] + u[i-1,j]) * dy2 +(u[i,j+1] + u[i,j-1]) * dx2) /(2*(dx2+dy2))Comment out one of jit or autojit (don’t use together)• jit --- provide type information (fastest to call at run-time)• autojit --- detects input types, infers output, generates codeif needed, and dispatches (a little more run-time calloverhead)Tuesday, June 4, 13
  9. 9. Example@jit(‘f8(f8)’)def sinc(x):if x==0.0:return 1.0else:return sin(x*pi)/(pi*x)NumbaTuesday, June 4, 13
  10. 10. Compile NumPy array expressionsfrom numba import autojit@autojitdef formula(a, b, c):a[1:,1:] = a[1:,1:] + b[1:,:-1] + c[1:,:-1]@autojitdef express(m1, m2):m2[1:-1:2,0,...,::2] = (m1[1:-1:2,...,::2]* m1[-2:1:-2,...,::2])return m2Tuesday, June 4, 13
  11. 11. Fast vectorizeNumPy’s ufuncs take “kernels” andapply the kernel element-by-elementover entire arrays Write kernels inPython!from numbapro import vectorizefrom math import sin@vectorize([‘f8(f8)’, ‘f4(f4)’])def sinc(x):if x==0.0:return 1.0else:return sin(x*pi)/(pi*x)Tuesday, June 4, 13
  12. 12. Create parallel-for loops“prange” directive that spawns compiled tasksin threads (like Open-MP parallel-for pragma)import numbaprofrom numba import autojit, prange@autojitdef parallel_sum2d(a):sum = 0.0for i in prange(a.shape[0]):for j in range(a.shape[1]):sum += a[i,j]Tuesday, June 4, 13
  13. 13. Example: MandelbrotVectorizedfrom numbapro import vectorizesig = uint8(uint32, f4, f4, f4, f4, uint32, uint32,uint32)@vectorize([sig], target=gpu)def mandel(tid, min_x, max_x, min_y, max_y, width,height, iters):pixel_size_x = (max_x - min_x) / widthpixel_size_y = (max_y - min_y) / heightx = tid % widthy = tid / widthreal = min_x + x * pixel_size_ximag = min_y + y * pixel_size_yc = complex(real, imag)z = 0.0jfor i in range(iters):z = z * z + cif (z.real * z.real + z.imag * z.imag) >= 4:return ireturn 255Kind Time Speed-upPython 263.6 1.0xCPU 2.639 100xGPU 0.1676 1573xTesla S2050Tuesday, June 4, 13
  14. 14. Many More Advanced Features!• Extension classes (jit a class -- autojit coming soon!)• Struct support (NumPy arrays can be structs)• SSA -- can refer to local variables as different types• Typed lists and typed dictionaries and sets comingsoon!• Calling ctypes and CFFI functions natively• pycc (create stand-alone dynamic library andexecutable)• pycc --python (create static extension module forPython)Tuesday, June 4, 13
  15. 15. Availability•Core is Open Source••GPU Compiliation and Parallelizationavailable in Anaconda Accelerate, €100.Tuesday, June 4, 13
  16. 16. Questions?http://continuum.iokammeyer@continuum.ioTuesday, June 4, 13