is it ready for production?          Mark Rees         Group CTO    Censof Holdings Berhad
pypy & menot affiliated with pypy teamhave followed it‟s development since2004use cpython and jython at workused ironpytho...
pypywhat is pypy? - RPython translation toolchain, a framework forgenerating dynamic programming languageimplementations -...
pypywant to know more about pypy- http://pypy.org/- david beazley pycon 2012 keynotehttp://goo.gl/5PXFQ- how the pypy jit ...
production ready – a definitionhttp://programmers.stackexchange.com/questions/61726/define-production-ready           it r...
pypy – does it run?                     of course, it runsSee http://pypy.readthedocs.org/en/latest/cpython_differences.ht...
pypy – other production criteriadoes it satisfy the project requirements- yesis it‟s design was well thought out- I would ...
pypy – does it work with the modules we usestandard library modules supported: __builtin__, __pypy__, _ast, _bisect, _code...
pypy – does it work with the modules we usepypy c-api support is beta, worked most ofthe time but failed with reportlab:Fa...
pypy – does it run as fast as cpython                 but!           http://speed.pypy.org/
pypy django benchmarkDJANGO_TMPL = Template("""<table>{% for row in table %}<tr>{% for col in row %}<td>{{ col|escape }}</...
my csv to xml benchmarkdef bench(data, output):  f = open(data, rb)  fn = [„age‟,….]  reader = csv.DictReader(f, fn)  writ...
my pypy benchmarkshttps://bitbucket.org/hexdump42/pypy-benchmarksaverage execution time (in seconds)benchmark       cpytho...
my pypy benchmarkshttps://bitbucket.org/hexdump42/pypy-benchmarksaverage execution time (in seconds)benchmark       cpytho...
my pypy benchmarkshttps://bitbucket.org/hexdump42/pypy-benchmarksaverage execution time (in seconds)benchmark       cpytho...
my pypy benchmarkshttps://bitbucket.org/hexdump42/pypy-benchmarksaverage execution time (in seconds)benchmark       cpytho...
my pypy benchmarkshttps://bitbucket.org/hexdump42/pypy-benchmarksmax memory usebenchmark      cpython    pypy-jit         ...
what is the pypy jit doing?https://bitbucket.org/pypy/jitviewer/
modified csv pypy benchmarks  https://bitbucket.org/hexdump42/pypy-benchmarks  average execution time (in seconds)benchmar...
is pypy ready for production1. it runs2. it satisfies the project requirements3. its design was well thought out4. its sta...
some other reasons to consider pypycffi – foreign function interface for python- http://cffi.readthedocs.org/pypy version ...
contact details                          Mark Rees                   mark at censof dot com                        +Mark R...
Upcoming SlideShare
Loading in …5
×

PyPy - is it ready for production

5,370 views
4,901 views

Published on

Please also read slide notes from slide 5 to get a better understanding how the analysis was done and the outcome

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,370
On SlideShare
0
From Embeds
0
Number of Embeds
53
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • I have listed a number of resources that I found helpful but this talk is more about using pypy rather than how it works.
  • The first 8 criteria came from a question on stackexchange, the last 2 are my additional requirements. A little detailed definition than the management version: it runs, it makes money.You may disagree with the list but it’s the criteria I will be using. Also I will be biased towards the needs of the company I work for. So let’s work thru the list to see how pypy stacks up.
  • It runs great on x86 32bit and 64bit platforms under Linux, Windows and OS X. There are other backend implementations – ARM, PPC, Java &amp; .NET VM’s. Some have had more love than others. Pypy implements the Python language version 2.7.2, supporting all of the core language passing the Python test suite. It supports most of the standard library modules. It has support for CPython C API but it is beta quality. I will go into more detail about standard library and other module compatibility later in the talk.
  • I am not a language interpreter designer so I cannot really comment on the design but you would assume with the number of years development &amp; refactoring by the pypy team it is a well thought out design.With regards maintainability, due to much of the pypytoolchain using RPython and the complexity of the architecture I feel it is hard for the normal python programmer to be able to contribute to coding maintenance of pypy. The learning curve is steep but certainly maintainability f the pure-python portions of the pypy components are easier.
  • As I said before pypy implements python language version 2.7.2
  • As at pypy 1.9 c-api support is considered beta and while it worked for many of the modules we use e.g PIL, it failed with the c extensions for reportlab. This wasn’t a show-stopper as these extensions also have python equivalents in the standard reportlab distribution. Of course, our python library use will be different from yours, so you experience will be dufferent as well.
  • The above plot represents PyPy trunk (with JIT) benchmark times normalized to Cpython as at 12 August 2012. Smaller is better.The standard benchmarks are limited to one domain and do not in a lot of cases cover complete processes or workloads. For example:
  • Thedjango benchmark in the standard pypy benchmark suite and was originally part of the unladen swallow benchmarks. So this benchmark is only testing the template rendering performance of django. There is nothing wrong with this and it’s a standard benchmark technique. So if you see the results of this benchmark, then it’s likely the performance of django template rendering under pypy would be faster than cpython. Does this mean your django website perfromance would be better? Maybe.
  • My benchmarks are a little different from the standard pypy as they simulate workloads similar to what we use python for at work. So rather than benchmarking a small portion or function as the standard benchmarks do, mine cover either a complete process or the majority of one. So my benchmarks are impacted by io as well as the in-program execution. Since the majority of the non web use of in our workplace is extract/transform/load (ETL) tasks, this is what the benchmarks are doing.
  • To perform the benchmarks, a clone of the pypy benchmark tools was done and my benchmarks added to it. You can see these at https://bitbucket.org/hexdump42/pypy-benchmarks. The benchmarks were run on a VMWare virtual instance with 2GB RAM, 1 Core 64bit running Scientific Linux 6.2. The base CPython used was 2.7.2 and comparison benchmarks were run against pypy-jit release 1.9 and the nightly pypy-jit build of August 14 2012 collecting avg execution time and memory use. 50 iteration benchmark runs. So for the bm_csv2xml benchmark, 100Mb csv file of census data to loaded, parsed and output as xml to a file. So it is faster than cpython, things are looking good. But I had hoped it would be a little better. So
  • I created a benchmark of just the csv load and parse and was surprised to see that it was slower than the cpython equivalent, so in my previous benchmark the xml output was what gave the improved performance under pypy.
  • .
  • The bm_interp benchmark just provides a baseline of what memory just the interpreter uses prior to any real work.Just in case these benchmark results were related to something related to my vm configuration, I also reran these benchmarks on physical hardware and obtained similar results. If I had stopped here, you would have say that pypy didn’t meet my production criteria but since some of the components that affect the performance are in python under pypy, I decided to see why performance wasn’t the same or better than cpython. I decided to start with the low hanging fruit – csv performance.
  • You can use the pypyjit viewer to see what is happening and of course I can review the source of _csv.py since it’s written in pure python. Thanks to some input in pypy issue tracker https://bugs.pypy.org/issue641
  • I was able to after a number of attempts modify _csv.py so that bm_csv benchmark performed at the same speed as cpython. This also gave a small performance improvement in the bm_csv2xml benchmark. Based on thee improvements, it is very likely we will use pypy in place of cpython for the ETL where we load csv files and convert to xml. I also intend to investigate where the performance bottlenecks are in the other ETL process benchmarks to see if we can get the gains sinmilar to what we get with pypy for the bm_csv2xml benchmark.
  • If we revisit the definition of production ready, certainly if we just use items 1-7 as the criteria, pypy is certainly production ready when compared with other python implementations that are being used in production. If you want to run existing python code under pypy, then pypy compatibility with non standard python libraries needs to be considered and getting your hands dirty by running the code under pypy is really the best way to see if pypy will work. If nothing else you can report an issue to the pypy team and they can use it to improve compatibility. And will our company be deploying anything in production under pypy? It is likely sometime this year we will look at deploying it for certain ETL workloads due to measured benchmark performance. The additional memory overhead isn’t an issue for us. So my recommendation is that if you are looking for performance improvements, give pypy a go, you may be surprised.But performance shouldn’t be the only reason to consider pypy, there are various pypy side projects that will have good benefits for the python community as a whole. Last week the pypy team released cffi Foreign Function Interface for Python calling C code. The aim of this project is to provide a convenient and reliable way of calling C code from Python. It is
  • But performance shouldn’t be the only reason to consider pypy, there are various pypy side projects that will have good benefits for the python community as a whole. Last week the pypy team released cffi Foreign Function Interface for Python calling C code. The aim of this project is to provide a convenient and reliable way of calling C code from Python. It works with both pypyabdcpython 2.6+. The pypy team are working a pypy implementation of numpy and are close to a py3k language compliant version. If you want to help with pypy, check out the howto help page &amp; the donation page.
  • PyPy - is it ready for production

    1. is it ready for production? Mark Rees Group CTO Censof Holdings Berhad
    2. pypy & menot affiliated with pypy teamhave followed it‟s development since2004use cpython and jython at workused ironpython for small projectsthe question:would pypy improve performance ofsome of our workloads?i am a manager, who still is wants to be aprogrammer, so i did the analysis
    3. pypywhat is pypy? - RPython translation toolchain, a framework forgenerating dynamic programming languageimplementations - a implementation of Python in Python using theframeworkhistory- first sprint 2003, EU project from 2004 – 2007- open source project from 2007 https://bitbucket.org/pypy- pypy 1.4 first release suitable for “production”12/2010
    4. pypywant to know more about pypy- http://pypy.org/- david beazley pycon 2012 keynotehttp://goo.gl/5PXFQ- how the pypy jit works http://goo.gl/dKgFp- why pypy by example http://goo.gl/vpQyJ
    5. production ready – a definitionhttp://programmers.stackexchange.com/questions/61726/define-production-ready it runs it satisfies the project requirements its design was well thought out its stable its maintainable its scalable its documented it works with the python modules we use it is as fast or faster than cpython
    6. pypy – does it run? of course, it runsSee http://pypy.readthedocs.org/en/latest/cpython_differences.htmlfor differences between PyPy and CPython
    7. pypy – other production criteriadoes it satisfy the project requirements- yesis it‟s design was well thought out- I would assume sois it stable- yesis it maintainable- 7 out of 10is it scalable- stackless & greenlets built inis it documented- cpython docs for functionality, rpython toolchain 8 outof 10
    8. pypy – does it work with the modules we usestandard library modules supported: __builtin__, __pypy__, _ast, _bisect, _codecs, _collections, _ffi, _hashlib, _io, _locale, _lsprof, _md5, _minimal_curses, _multiprocessing, _random, _rawffi, _sha, _socket, _sre, _ssl, _warnings, _weakref, _winreg, array, binascii, bz2, cStringIO, clr, cmath, cpyext, crypt, errno, exceptions, fcntl, gc, imp, itertools, marshal, math, mmap, operator, oracle, parser, posix, pyexpat, select, signal, struct, symbol, sys, termios, thread, time, token, unicodedata, zipimport, zlibthese modules are supported but written inpython: cPickle, _csv, ctypes, datetime, dbm, _functools, grp, pwd, readline, resource, sqlite3, syslog, tputilmany python libs are known to work, like: ctypes, django, pyglet, sqlalchemy, PIL, sqlalchemy. See https://bitbucket.org/pypy/compatibility/wiki/Home for a more exhaustive list.
    9. pypy – does it work with the modules we usepypy c-api support is beta, worked most ofthe time but failed with reportlab:Fatal error in cpyext, CPython compatibility layer, callingPySequence_GetItemEither report a bug or consider not using this particular extension<OpErrFmt object at 0x7f1e89587e88>RPython traceback: File "module_cpyext_api_2.c", line 51963, in PySequence_GetItem File "module_cpyext_pyobject.c", line 1071, inBaseCpyTypedescr_realize File "objspace_std_objspace.c", line 3396, inallocate_instance__W_ObjectObject File "objspace_std_typeobject.c", line 3010, inW_TypeObject_check_user_subclassSegmentation faultBut this was the only compatibility issue wehad running all of our python code underpypy and we could fallback to pure pythonreportlab extensions anyway.
    10. pypy – does it run as fast as cpython but! http://speed.pypy.org/
    11. pypy django benchmarkDJANGO_TMPL = Template("""<table>{% for row in table %}<tr>{% for col in row %}<td>{{ col|escape }}</td>{% endfor %}</tr>{% endfor %}</table>""")def test_django(count): table = [xrange(150) for _ in xrange(150)] context = Context({"table": table}) # Warm up Django. DJANGO_TMPL.render(context) DJANGO_TMPL.render(context) times = [] for _ in xrange(count): t0 = time.time() data = DJANGO_TMPL.render(context) t1 = time.time() times.append(t1 - t0) return times
    12. my csv to xml benchmarkdef bench(data, output): f = open(data, rb) fn = [„age‟,….] reader = csv.DictReader(f, fn) writer = SAXWriter(output) writer.start_doc() writer.start_tag(data) try: for row in reader: writer.start_tag(row) for key in row.keys(): writer.tag(key.replace( , _), body=row[key]) writer.end_tag(row) finally: f.close() writer.end_tag(data) writer.end_doc()
    13. my pypy benchmarkshttps://bitbucket.org/hexdump42/pypy-benchmarksaverage execution time (in seconds)benchmark cpython pypy-jit pypy-jit 2.7.3 1.9 nightlybm_csv2xml 88.26/94. 28.89 3.0549 x 28.96 3.3723 x 04 faster faster
    14. my pypy benchmarkshttps://bitbucket.org/hexdump42/pypy-benchmarksaverage execution time (in seconds)benchmark cpython pypy-jit pypy-jit 2.7.3 1.9 nightlybm_csv2xml 88.26/94. 28.89 3.0549 x 28.96 3.3723 x 04 faster fasterbm_csv 1.54/1.65 5.89 3.8122 x 5.78 3.5025 x slower slower
    15. my pypy benchmarkshttps://bitbucket.org/hexdump42/pypy-benchmarksaverage execution time (in seconds)benchmark cpython pypy-jit pypy-jit 2.7.3 1.9 nightlybm_csv2xml 88.26/94. 28.89 3.0549 x 28.96 3.3723 x 04 faster fasterbm_csv 1.54/1.65 5.89 3.8122 x 5.78 3.5025 x slower slowerbm_openpyxml 1.31/1.21 3.26 2.4871 x 3.15 2.6051 x slower slower
    16. my pypy benchmarkshttps://bitbucket.org/hexdump42/pypy-benchmarksaverage execution time (in seconds)benchmark cpython pypy-jit pypy-jit 2.7.3 1.9 nightlybm_csv2xml 88.26/94. 28.89 3.0549 x 28.96 3.3723 x 04 faster fasterbm_csv 1.54/1.65 5.89 3.8122 x 5.78 3.5025 x slower slowerbm_openpyxml 1.31/1.21 3.26 2.4871 x 3.15 2.6051 x slower slowerbm_xhtml2pdf 1.91/1.95 3.27 1.7155 x 4.22 2.1637 x slower slower
    17. my pypy benchmarkshttps://bitbucket.org/hexdump42/pypy-benchmarksmax memory usebenchmark cpython pypy-jit pypy-jit 2.7.3 1.9 nightlybm_interp 5412/5248 12556 2.32 x 21880 4.1692 x larger largerbm_csv2xml 7048/7064 55180 7.8292 x 55232 7.8188 x larger largerbm_csv 5812/5180 52200 8.9814 x 52176 10.0726 larger x largerbm_openpyxml 12656/ 77252 6.1040 x 80428 6.3549 x 12656 larger largerbm_xhtml2pdf 48880/ 236792 4.8444 x 101376 2.906 x 34884 larger larger
    18. what is the pypy jit doing?https://bitbucket.org/pypy/jitviewer/
    19. modified csv pypy benchmarks https://bitbucket.org/hexdump42/pypy-benchmarks average execution time (in seconds)benchmark cpython pypy-jit pypy-jit 2.7.3 1.9 nightlybm_csv2xml_mod 88.25/90.02 23.65 3.7315 x 23.86 3.7728x faster fasterbm_csv_mod 1.62/1.69 1.89 0.8571 x 1.72 0.9825 x slower slower
    20. is pypy ready for production1. it runs2. it satisfies the project requirements3. its design was well thought out4. its stable5. its maintainable6. its scalable7. its documented8. it works with the python modules we use9. it is as fast or faster than cpython
    21. some other reasons to consider pypycffi – foreign function interface for python- http://cffi.readthedocs.org/pypy version of numpypy3k version of pypycheck out the STM/AME projecthttp://www.pypy.org/howtohelp.html
    22. contact details Mark Rees mark at censof dot com +Mark Rees @hexdump42 hex-dump.blogspot.comhttp://www.slideshare.net/hexdump42/pypy-is-it-ready-for-production

    ×