Pythran – C++ for Snakes
Static Compiler for High Performance
Serge Guelton & Pierrick Brunet & Mehdi Amini
PyData - May 4...
1 33
Disclaimer
Timings were performed using OSX 10.9, an i7-3740QM CPU @ 2.70GHz, Python
2.7.6, current Pythran (branch P...
1 33
Disclaimer
Timings were performed using OSX 10.9, an i7-3740QM CPU @ 2.70GHz, Python
2.7.6, current Pythran (branch P...
1 33
Disclaimer
Timings were performed using OSX 10.9, an i7-3740QM CPU @ 2.70GHz, Python
2.7.6, current Pythran (branch P...
2 33
Prototyping Tools for Scientific Computing
2 33
Prototyping Tools for Scientific Computing
3 33
Tools for Scientific Computing in Python
FORTRAN + f2py
C + SWIG
4 33
I Do not Know Much About Python But it Does not Matter Because...
I only care about the white spot.
5 33
Regular IPython Session
>>> import numpy as np
>>> def rosen(x):
... return sum (100.*(x[1:]-x[: -1]**2.)**2.
... + (...
6 33
IPython Session with Pythran
>>> %load_ext pythranmagic
>>> %% pythran
import numpy as np
#pythran export rosen(float...
6 33
IPython Session with Pythran
>>> %load_ext pythranmagic
>>> %% pythran
import numpy as np
#pythran export rosen(float...
7 33
Pythran’s Meta Program Generator
Python Module [.py]OpenMP
Pythran
C++ meta-programboost::python pythonic++
g++Type I...
8 33
Pythran Moto
Goals
1. Performance First
sacrifice language feature
2. Subset of Python
backward compatibility matters
...
9 33
A Story of Python & C++
def dot(l0 , l1):
return sum(x*y for x,y in zip(l0 ,l1))
template <class T0 , class T1 >
auto...
10 33
C++ as a Back-end
A High Level Language
Rich Standard Template Library
Object Oriented Programming
Meta-Programming
...
11 33
Let’s Dive Into the Backend Runtime...
template <class Op , class S0 , class ... Iters >
auto map_aux (Op &op , S0 c...
11 33
Let’s Dive Into the Backend Runtime... I’m Kidding!
template <class Op , class S0 , class ... Iters >
auto map_aux (...
12 33
Static Compilation
Buys time for many time-consuming analyses
Points-to, used-def, variable scope, memory effects, fu...
13 33
Pythran’s Moto 1/3
Gather as many information as possible
(Typing is just one information among others)
14 33
Example of Analysis: Points-to
def foo(a,b):
c = a or b
return c*2
Where does c points to ?
15 33
Example of Analysis: Argument Effects
def fib(n):
return n if n<2 else fib(n-1) + fib(n-2)
def bar(l):
return map(fib...
16 33
Example of Analysis: Pure Functions
def f0(a):
return a**2
def f1(a):
b = f0(a)
print b
return b
l = list (...)
map(...
17 33
Example of Analysis: Use - Def Chains
a = ’1’
if cond:
a = int(a)
else:
a = 3
print a
a = 4
Which version of a is se...
18 33
Pythran’s Moto 2/3
Gather as many information as possible
Turn them into Code Optimizations!
19 33
Example of Code Optimization: False Polymorphism
a = cos (1)
if cond:
a = str(a)
else:
a = None
foo(a)
Is this code ...
20 33
Example of Code Optimization: Lazy Iterators
def valid_conversion (n):
l = map(math.cos , range(n))
return sum(l)
de...
21 33
Example of Code Optimization: Constant Folding
def esieve(n):
candidates = range (2, n+1)
return sorted(
set(candida...
22 33
Pythran’s Moto 3/3
Gather as many information as possible
Turn them into Code Optimizations
Vectorize! Parallelize!
23 33
Explicit Parallelization
def hyantes(xmin , ymin , xmax , ymax , step , range_ , range_x , range_y , t):
pt = [[0]* ...
24 33
Library Level Optimizations
Numpy is the key
Basic block of Python Scientific Computing
High-level Array Manipulation...
25 33
Efficient Numpy Expressions
Expression Templates
1. A classic C++ meta-programming optimization
2. Brings Lazy Evaluat...
26 33
Julia Set, a Cython Example
def run_julia(cr , ci , N, bound , lim , cutoff ):
julia = np.empty ((N, N), np.uint32)
...
27 33
Mandelbrot, a Numba example
@autojit
def mandel(x, y, max_iters ):
"""
Given the real and imaginary parts of a compl...
28 33
Nqueens, to Show Some Cool Python
# Pure -Python implementation of
# itertools. permutations ()
# Why? Because we ca...
29 33
Numpy Benchmarks
https://github.com/serge-sans-paille/numpy-benchmarks/
Made with Matplotlib last night!
Debian clan...
30 33
Crushing the Head of the Snake
I would have titled it ”What every Pythonista should know about Python!”
(hopefully w...
31 33
The Compiler Is not the Solution to Keyboard-Chair Interface, Is It?
def total(arr):
s = 0
for j in range(len(arr ))...
31 33
The Compiler Is not the Solution to Keyboard-Chair Interface, Is It?
def total(arr):
s = 0
for j in range(len(arr ))...
31 33
The Compiler Is not the Solution to Keyboard-Chair Interface, Is It?
def total(arr):
s = 0
for j in range(len(arr ))...
31 33
The Compiler Is not the Solution to Keyboard-Chair Interface, Is It?
def total(arr):
s = 0
for j in range(len(arr ))...
31 33
The Compiler Is not the Solution to Keyboard-Chair Interface, Is It?
def total(arr):
s = 0
for j in range(len(arr ))...
31 33
The Compiler Is not the Solution to Keyboard-Chair Interface, Is It?
def total(arr):
s = 0
for j in range(len(arr ))...
31 33
The Compiler Is not the Solution to Keyboard-Chair Interface, Is It?
def total(arr):
s = 0
for j in range(len(arr ))...
32 33
Engineering Stuff
Get It
Follow the source: https://github.com/serge-sans-paille/pythran
(we are waiting for your pul...
33
Conclusion
Compiling Python means more than typing and translating
http://pythonhosted.org/pythran/
33
Conclusion
Compiling Python means more than typing and translating
What next:
Release the PyData version
http://pythonh...
33
Conclusion
Compiling Python means more than typing and translating
What next:
Release the PyData version (ooops we’re l...
33
Conclusion
Compiling Python means more than typing and translating
What next:
Release the PyData version (ooops we’re l...
33
Conclusion
Compiling Python means more than typing and translating
What next:
Release the PyData version (ooops we’re l...
33
Conclusion
Compiling Python means more than typing and translating
What next:
Release the PyData version (ooops we’re l...
33
Conclusion
Compiling Python means more than typing and translating
What next:
Release the PyData version (ooops we’re l...
33
Conclusion
Compiling Python means more than typing and translating
What next:
Release the PyData version (ooops we’re l...
Upcoming SlideShare
Loading in …5
×

Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014

1,477 views

Published on

Pythran is a an ahead of time compiler that turns modules written in a large subset of Python into C++ meta-programs that can be compiled into efficient native modules. It targets mainly compute intensive part of the code, hence it comes as no surprise that it focuses on scientific applications that makes extensive use of Numpy. Under the hood, Pythran inter-procedurally analyses the program and performs high level optimizations and parallel code generation. Parallelism can be found implicitly in Python intrinsics or Numpy operations, or explicitly specified by the programmer using OpenMP directives directly in the Python source code. Either way, the input code remains fully compatible with the Python interpreter. While the idea is similar to Parakeet or Numba, the approach differs significantly: the code generation is not performed at runtime but offline. Pythran generates C++11 heavily templated code that makes use of the NT2 meta-programming library and relies on any standard-compliant compiler to generate the binary code. We propose to walk through some examples and benchmarks, exposing the current state of what Pythran provides as well as the limit of the approach.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,477
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
18
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014

  1. 1. Pythran – C++ for Snakes Static Compiler for High Performance Serge Guelton & Pierrick Brunet & Mehdi Amini PyData - May 4th 2014 Get the slides: http://goo.gl/6dgra0
  2. 2. 1 33 Disclaimer Timings were performed using OSX 10.9, an i7-3740QM CPU @ 2.70GHz, Python 2.7.6, current Pythran (branch Pydata2014), pypy 2.2.1, numba 0.13.1, gcc 4.8, Clang r207887. I am not Pythonista, but I’m interested in performance in general. Daily job: driver-level C code, assembly, multi-threaded C++, GPU, ...
  3. 3. 1 33 Disclaimer Timings were performed using OSX 10.9, an i7-3740QM CPU @ 2.70GHz, Python 2.7.6, current Pythran (branch Pydata2014), pypy 2.2.1, numba 0.13.1, gcc 4.8, Clang r207887. I am not Pythonista, but I’m interested in performance in general. Daily job: driver-level C code, assembly, multi-threaded C++, GPU, ... Note: I love to troll, so let’s have a beer later and talk about how awful Python is! ;-)
  4. 4. 1 33 Disclaimer Timings were performed using OSX 10.9, an i7-3740QM CPU @ 2.70GHz, Python 2.7.6, current Pythran (branch Pydata2014), pypy 2.2.1, numba 0.13.1, gcc 4.8, Clang r207887. I am not Pythonista, but I’m interested in performance in general. Daily job: driver-level C code, assembly, multi-threaded C++, GPU, ... Note: I love to troll, so let’s have a beer later and talk about how awful Python is! ;-) By the way this talk is written in Latex and takes more than 10 seconds to compile.
  5. 5. 2 33 Prototyping Tools for Scientific Computing
  6. 6. 2 33 Prototyping Tools for Scientific Computing
  7. 7. 3 33 Tools for Scientific Computing in Python FORTRAN + f2py C + SWIG
  8. 8. 4 33 I Do not Know Much About Python But it Does not Matter Because... I only care about the white spot.
  9. 9. 5 33 Regular IPython Session >>> import numpy as np >>> def rosen(x): ... return sum (100.*(x[1:]-x[: -1]**2.)**2. ... + (1-x[: -1])**2.) >>> import numpy as np >>> r = np.random.rand (100000) >>> %timeit rosen(r) 10 loops , best of 3: 35.1 ms per loop In mathematical optimization, the Rosenbrock function is a non-convex function used as a performance test problem for optimization algorithms introduced by Howard H. Rosenbrock in 1960.[1] It is also known as Rosenbrock’s valley or Rosenbrock’s banana function. (Wikipedia)
  10. 10. 6 33 IPython Session with Pythran >>> %load_ext pythranmagic >>> %% pythran import numpy as np #pythran export rosen(float []) def rosen(x): return sum (100.*(x[1:]-x[: -1]**2.)**2. + (1-x[: -1])**2.) >>> import numpy as np >>> r = np.random.rand (100000) >>> %timeit rosen(r) 10000 loops , best of 3: 121 us per loop
  11. 11. 6 33 IPython Session with Pythran >>> %load_ext pythranmagic >>> %% pythran import numpy as np #pythran export rosen(float []) def rosen(x): return sum (100.*(x[1:]-x[: -1]**2.)**2. + (1-x[: -1])**2.) >>> import numpy as np >>> r = np.random.rand (100000) >>> %timeit rosen(r) 10000 loops , best of 3: 121 us per loop That’s a ×290 speedup!
  12. 12. 7 33 Pythran’s Meta Program Generator Python Module [.py]OpenMP Pythran C++ meta-programboost::python pythonic++ g++Type Info NT2 Native Module [.so]
  13. 13. 8 33 Pythran Moto Goals 1. Performance First sacrifice language feature 2. Subset of Python backward compatibility matters pythran ≈ python - useless stuff (i.e. class, eval, introspection, polymorphic variable) 3. Modular Compilation focus on numerical kernels Means 1. Static Compilation 2. Static Optimizations 3. Vectorization & Parallelization
  14. 14. 9 33 A Story of Python & C++ def dot(l0 , l1): return sum(x*y for x,y in zip(l0 ,l1)) template <class T0 , class T1 > auto dot(T0&& l0 , T1&& l1) -> decltype(/* ... */) { return pythonic ::sum( pythonic ::map( operator_ :: multiply (), pythonic ::zip(std:: forward <T0 >(l0), std:: forward <T1 >(l1)) ) ); }
  15. 15. 10 33 C++ as a Back-end A High Level Language Rich Standard Template Library Object Oriented Programming Meta-Programming Glimpses of type inference C++11 Variadic Templates C++11 A Low Level Language Few Hidden Costs: ”you don’t pay for what you don’t use” Direct access to vector instruction units SSE, AVX, . . . Good for Parallel Programming OpenMP, TBB, . . . (and the parallel STL is proposed for C++17)
  16. 16. 11 33 Let’s Dive Into the Backend Runtime... template <class Op , class S0 , class ... Iters > auto map_aux (Op &op , S0 const &seq , Iters ... iters) -> sequence <decltype(op(* seq.begin (), *iters ...)) > { decltype(_map(op , seq , iters ...)) s; auto iter = std :: back_inserter (s); for(auto& iseq : seq) *iter ++= op(iseq , *iters ++...); return s; } template <class Op , class S0 , class ... SN > auto map ( Op op , S0 const &seq , SN const &... seqs ) -> decltype(_map( op , seq , seqs.begin ()...)) { return _map(op , seq , seqs.begin ()...); }
  17. 17. 11 33 Let’s Dive Into the Backend Runtime... I’m Kidding! template <class Op , class S0 , class ... Iters > auto map_aux (Op &op , S0 const &seq , Iters ... iters) -> sequence <decltype(op(* seq.begin (), *iters ...)) > { decltype(_map(op , seq , iters ...)) s; auto iter = std :: back_inserter (s); for(auto& iseq : seq) *iter ++= op(iseq , *iters ++...); return s; } template <class Op , class S0 , class ... SN > auto map ( Op op , S0 const &seq , SN const &... seqs ) -> decltype(_map( op , seq , seqs.begin ()...)) { return _map(op , seq , seqs.begin ()...); }
  18. 18. 12 33 Static Compilation Buys time for many time-consuming analyses Points-to, used-def, variable scope, memory effects, function purity. . . Unleashes powerful C++ optimizations Lazy Evaluation, Map Parallelizations, Constant Folding Requires static type inference #pythran export foo(int list , float) Only annotate exported functions!
  19. 19. 13 33 Pythran’s Moto 1/3 Gather as many information as possible (Typing is just one information among others)
  20. 20. 14 33 Example of Analysis: Points-to def foo(a,b): c = a or b return c*2 Where does c points to ?
  21. 21. 15 33 Example of Analysis: Argument Effects def fib(n): return n if n<2 else fib(n-1) + fib(n-2) def bar(l): return map(fib , l) def foo(l): return map(fib , random.sample(l, 3)) Do fibo, bar and foo update their arguments?
  22. 22. 16 33 Example of Analysis: Pure Functions def f0(a): return a**2 def f1(a): b = f0(a) print b return b l = list (...) map(f0 , l) map(f1 , l) Are f0 and f1 pure functions?
  23. 23. 17 33 Example of Analysis: Use - Def Chains a = ’1’ if cond: a = int(a) else: a = 3 print a a = 4 Which version of a is seen by the print statement?
  24. 24. 18 33 Pythran’s Moto 2/3 Gather as many information as possible Turn them into Code Optimizations!
  25. 25. 19 33 Example of Code Optimization: False Polymorphism a = cos (1) if cond: a = str(a) else: a = None foo(a) Is this code snippet statically typable?
  26. 26. 20 33 Example of Code Optimization: Lazy Iterators def valid_conversion (n): l = map(math.cos , range(n)) return sum(l) def invalid_conversion (n): l = map(math.cos , range(n)) l[0] = 1 return sum(l) + max(l) Which map can be converted into an imap
  27. 27. 21 33 Example of Code Optimization: Constant Folding def esieve(n): candidates = range (2, n+1) return sorted( set(candidates) - set(p*i for p in candidates for i in range(p, n+1)) ) cache = esieve (100) Can we evalute esieve at compile time?
  28. 28. 22 33 Pythran’s Moto 3/3 Gather as many information as possible Turn them into Code Optimizations Vectorize! Parallelize!
  29. 29. 23 33 Explicit Parallelization def hyantes(xmin , ymin , xmax , ymax , step , range_ , range_x , range_y , t): pt = [[0]* range_y for _ in range(range_x )] #omp parallel for for i in xrange(range_x ): for j in xrange(range_y ): s = 0 for k in t: tmp = 6368.* math.acos(math.cos(xmin+step*i)* math.cos( k[0] ) * math.cos(( ymin+step*j)-k[1]) + math.sin(xmin+step*i)* math.sin(k[0])) if tmp < range_: s+=k[2] / (1+ tmp) pt[i][j] = s return pt Tool CPython Pythran OpenMP Timing 639.0ms 44.8ms 11.2ms Speedup ×1 ×14.2 ×57.7
  30. 30. 24 33 Library Level Optimizations Numpy is the key Basic block of Python Scientific Computing High-level Array Manipulations Many common functions implemented 1. This smells like FORTRAN 2. For the compiler guy, FORTRAN smells good 3. Unlock vectorization & parallelization of Numpy code! Cython is known to be ”as fast as C”, the only way we found to beat it is to be ”as fast as Fortran”, hence: Pythran
  31. 31. 25 33 Efficient Numpy Expressions Expression Templates 1. A classic C++ meta-programming optimization 2. Brings Lazy Evaluation to C++ 3. Equivalent to loop fusion More Optimizations vectorization through boost::simd and nt2 parallelization through #pragma omp
  32. 32. 26 33 Julia Set, a Cython Example def run_julia(cr , ci , N, bound , lim , cutoff ): julia = np.empty ((N, N), np.uint32) grid_x = np.linspace(-bound , bound , N) t0 = time () #omp parallel for for i, x in enumerate(grid_x ): for j, y in enumerate(grid_x ): julia[i,j] = kernel(x, y, cr , ci , lim , cutoff) return julia , time () - t0 From Scipy2013 Cython Tutorial. Tool CPython Cython Pythran +OpenMP Timing 3630ms 4.3ms 3.71ms 1.52ms Speedup ×1 ×837 ×970 ×2368
  33. 33. 27 33 Mandelbrot, a Numba example @autojit def mandel(x, y, max_iters ): """ Given the real and imaginary parts of a complex number , determine if it is a candidate for membership in the Mandelbrot set given a fixed number of iterations. """ i = 0 c = complex(x, y) z = 0.0j for i in range(max_iters ): z = z**2 + c if abs(z)**2 >= 4: return i return 255 @autojit def create_fractal (min_x , max_x , min_y , max_y , image , iters ) height = image.shape [0] width = image.shape [1] pixel_size_x = (max_x - min_x) / width pixel_size_y = (max_y - min_y) / height for x in range(width ): real = min_x + x * pixel_size_x for y in range(height ): imag = min_y + y * pixel_size_y color = mandel(real , imag , iters) image[y, x] = color return image Tool CPython Numba Pythran Timing 8170ms 56ms 47.2ms Speedup ×1 ×145 ×173 ”g++ -Ofast” here
  34. 34. 28 33 Nqueens, to Show Some Cool Python # Pure -Python implementation of # itertools. permutations () # Why? Because we can :-) def permutations (iterable , r=None ): pool = tuple(iterable) n = len(pool) if r is None: r = n indices = range(n) cycles = range(n-r+1, n+1)[:: -1] yield tuple(pool[i] for i in indices [:r]) while n: for i in reversed(xrange(r)): cycles[i] -= 1 if cycles[i] == 0: indices[i:] = indices[i+1:] + indices[i:i+1] cycles[i] = n - i else: j = cycles[i] indices[i], indices[-j] = indices[-j], indices[i] yield tuple(pool[i] for i in indices [:r]) break else: return #pythran export n_queens(int) def n_queens( queen_count ): """N-Queens solver. Args: queen_count: the number of queens to solve for. This is also the board size. Yields: Solutions to the problem. Each yielded value is looks like (3, 8, 2, ..., 6) where each number is the column position for the queen , and the index into the tuple indicates the row. """ out =list () cols = range( queen_count ) for vec in permutations (cols ,None ): if ( queen_count == len(set(vec[i]+i for i in cols ))): out.append(vec) return out Solving the NQueen problem, using generator, generator expression, list comprehension, sets. . . http://code.google.com/p/unladen-swallow/ Tool CPython PyPy Pythran Timing 2640.6ms 501.1ms 693.3ms Speedup ×1 ×5.27 ×3.8
  35. 35. 29 33 Numpy Benchmarks https://github.com/serge-sans-paille/numpy-benchmarks/ Made with Matplotlib last night! Debian clang version 3.5-1 exp1, x86-64, i7-3520M CPU @ 2.90GHz. No OpenMP/Vectorization for Pythran.
  36. 36. 30 33 Crushing the Head of the Snake I would have titled it ”What every Pythonista should know about Python!” (hopefully we’ll get the video soon).
  37. 37. 31 33 The Compiler Is not the Solution to Keyboard-Chair Interface, Is It? def total(arr): s = 0 for j in range(len(arr )): s += arr[j] return s def varsum(arr): vs = 0 for j in range(len(arr )): mean = (total(arr) / len(arr )) vs += (arr[j] - mean) ** 2 return vs #pythran export stddev(float64 list list) def stddev(partitions ): ddof =1 final = 0.0 for part in partitions: m = total(part) / len(part) # Find the mean of the entire grup. gtotal = total ([ total(p) for p in partitions ]) glength = total ([ len(p) for p in partitions ]) g = gtotal / glength adj = ((2 * total(part) * (m - g)) + ((g ** 2 - m ** final += varsum(part) + adj return math.sqrt(final / (glength - ddof )) Version Awful Less Awful OK Differently OK OK Numpy CPython 127s 150ms 54.3ms 53.8ms 47.6ms 8.2ms Pythran
  38. 38. 31 33 The Compiler Is not the Solution to Keyboard-Chair Interface, Is It? def total(arr): s = 0 for j in range(len(arr )): s += arr[j] return s def varsum(arr): vs = 0 for j in range(len(arr )): mean = (total(arr) / len(arr )) vs += (arr[j] - mean) ** 2 return vs #pythran export stddev(float64 list list) def stddev(partitions ): ddof =1 final = 0.0 for part in partitions: m = total(part) / len(part) # Find the mean of the entire grup. gtotal = total ([ total(p) for p in partitions ]) glength = total ([ len(p) for p in partitions ]) g = gtotal / glength adj = ((2 * total(part) * (m - g)) + ((g ** 2 - m ** final += varsum(part) + adj return math.sqrt(final / (glength - ddof )) Version Awful Less Awful OK Differently OK OK Numpy CPython 127s 150ms 54.3ms 53.8ms 47.6ms 8.2ms Pythran 1.38s
  39. 39. 31 33 The Compiler Is not the Solution to Keyboard-Chair Interface, Is It? def total(arr): s = 0 for j in range(len(arr )): s += arr[j] return s def varsum(arr): vs = 0 for j in range(len(arr )): mean = (total(arr) / len(arr )) vs += (arr[j] - mean) ** 2 return vs #pythran export stddev(float64 list list) def stddev(partitions ): ddof =1 final = 0.0 for part in partitions: m = total(part) / len(part) # Find the mean of the entire grup. gtotal = total ([ total(p) for p in partitions ]) glength = total ([ len(p) for p in partitions ]) g = gtotal / glength adj = ((2 * total(part) * (m - g)) + ((g ** 2 - m ** final += varsum(part) + adj return math.sqrt(final / (glength - ddof )) Version Awful Less Awful OK Differently OK OK Numpy CPython 127s 150ms 54.3ms 53.8ms 47.6ms 8.2ms Pythran 1.38s 4.7ms
  40. 40. 31 33 The Compiler Is not the Solution to Keyboard-Chair Interface, Is It? def total(arr): s = 0 for j in range(len(arr )): s += arr[j] return s def varsum(arr): vs = 0 for j in range(len(arr )): mean = (total(arr) / len(arr )) vs += (arr[j] - mean) ** 2 return vs #pythran export stddev(float64 list list) def stddev(partitions ): ddof =1 final = 0.0 for part in partitions: m = total(part) / len(part) # Find the mean of the entire grup. gtotal = total ([ total(p) for p in partitions ]) glength = total ([ len(p) for p in partitions ]) g = gtotal / glength adj = ((2 * total(part) * (m - g)) + ((g ** 2 - m ** final += varsum(part) + adj return math.sqrt(final / (glength - ddof )) Version Awful Less Awful OK Differently OK OK Numpy CPython 127s 150ms 54.3ms 53.8ms 47.6ms 8.2ms Pythran 1.38s 4.7ms 4.8ms
  41. 41. 31 33 The Compiler Is not the Solution to Keyboard-Chair Interface, Is It? def total(arr): s = 0 for j in range(len(arr )): s += arr[j] return s def varsum(arr): vs = 0 for j in range(len(arr )): mean = (total(arr) / len(arr )) vs += (arr[j] - mean) ** 2 return vs #pythran export stddev(float64 list list) def stddev(partitions ): ddof =1 final = 0.0 for part in partitions: m = total(part) / len(part) # Find the mean of the entire grup. gtotal = total ([ total(p) for p in partitions ]) glength = total ([ len(p) for p in partitions ]) g = gtotal / glength adj = ((2 * total(part) * (m - g)) + ((g ** 2 - m ** final += varsum(part) + adj return math.sqrt(final / (glength - ddof )) Version Awful Less Awful OK Differently OK OK Numpy CPython 127s 150ms 54.3ms 53.8ms 47.6ms 8.2ms Pythran 1.38s 4.7ms 4.8ms 5.8ms
  42. 42. 31 33 The Compiler Is not the Solution to Keyboard-Chair Interface, Is It? def total(arr): s = 0 for j in range(len(arr )): s += arr[j] return s def varsum(arr): vs = 0 for j in range(len(arr )): mean = (total(arr) / len(arr )) vs += (arr[j] - mean) ** 2 return vs #pythran export stddev(float64 list list) def stddev(partitions ): ddof =1 final = 0.0 for part in partitions: m = total(part) / len(part) # Find the mean of the entire grup. gtotal = total ([ total(p) for p in partitions ]) glength = total ([ len(p) for p in partitions ]) g = gtotal / glength adj = ((2 * total(part) * (m - g)) + ((g ** 2 - m ** final += varsum(part) + adj return math.sqrt(final / (glength - ddof )) Version Awful Less Awful OK Differently OK OK Numpy CPython 127s 150ms 54.3ms 53.8ms 47.6ms 8.2ms Pythran 1.38s 4.7ms 4.8ms 5.8ms 4.7ms
  43. 43. 31 33 The Compiler Is not the Solution to Keyboard-Chair Interface, Is It? def total(arr): s = 0 for j in range(len(arr )): s += arr[j] return s def varsum(arr): vs = 0 for j in range(len(arr )): mean = (total(arr) / len(arr )) vs += (arr[j] - mean) ** 2 return vs #pythran export stddev(float64 list list) def stddev(partitions ): ddof =1 final = 0.0 for part in partitions: m = total(part) / len(part) # Find the mean of the entire grup. gtotal = total ([ total(p) for p in partitions ]) glength = total ([ len(p) for p in partitions ]) g = gtotal / glength adj = ((2 * total(part) * (m - g)) + ((g ** 2 - m ** final += varsum(part) + adj return math.sqrt(final / (glength - ddof )) Version Awful Less Awful OK Differently OK OK Numpy CPython 127s 150ms 54.3ms 53.8ms 47.6ms 8.2ms Pythran 1.38s 4.7ms 4.8ms 5.8ms 4.7ms 1.7ms
  44. 44. 32 33 Engineering Stuff Get It Follow the source: https://github.com/serge-sans-paille/pythran (we are waiting for your pull requests) Debian repo: deb http://ridee.enstb.org/debian unstable main Join us: #pythran on Freenode (very active), or pythran@freelists.org (not very active) Available on PyPI, using Python 2.7, +2000 test cases, PEP8 approved, clang++ & g++ (>= 4.8) friendly, Linux and OSX validated.
  45. 45. 33 Conclusion Compiling Python means more than typing and translating http://pythonhosted.org/pythran/
  46. 46. 33 Conclusion Compiling Python means more than typing and translating What next: Release the PyData version http://pythonhosted.org/pythran/
  47. 47. 33 Conclusion Compiling Python means more than typing and translating What next: Release the PyData version (ooops we’re late) http://pythonhosted.org/pythran/
  48. 48. 33 Conclusion Compiling Python means more than typing and translating What next: Release the PyData version (ooops we’re late I did not specify which year) User module import (pull request already issued) http://pythonhosted.org/pythran/
  49. 49. 33 Conclusion Compiling Python means more than typing and translating What next: Release the PyData version (ooops we’re late I did not specify which year) User module import (pull request already issued) Set-up an daily performance regression test bot (ongoing work with codespeed) http://pythonhosted.org/pythran/
  50. 50. 33 Conclusion Compiling Python means more than typing and translating What next: Release the PyData version (ooops we’re late I did not specify which year) User module import (pull request already issued) Set-up an daily performance regression test bot (ongoing work with codespeed) Re-enable vectorization through boost::simd and nt2 (soon) http://pythonhosted.org/pythran/
  51. 51. 33 Conclusion Compiling Python means more than typing and translating What next: Release the PyData version (ooops we’re late I did not specify which year) User module import (pull request already issued) Set-up an daily performance regression test bot (ongoing work with codespeed) Re-enable vectorization through boost::simd and nt2 (soon) More Numpy support, start looking into Scipy http://pythonhosted.org/pythran/
  52. 52. 33 Conclusion Compiling Python means more than typing and translating What next: Release the PyData version (ooops we’re late I did not specify which year) User module import (pull request already issued) Set-up an daily performance regression test bot (ongoing work with codespeed) Re-enable vectorization through boost::simd and nt2 (soon) More Numpy support, start looking into Scipy Polyhedral transformations (in another life...) http://pythonhosted.org/pythran/

×