Your SlideShare is downloading. ×
0
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
A Better Python for the JVM
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

A Better Python for the JVM

4,510

Published on

My talk about how we are improving Jython from EuroPython 2009.

My talk about how we are improving Jython from EuroPython 2009.

Published in: Technology, Education
1 Comment
6 Likes
Statistics
Notes
No Downloads
Views
Total Views
4,510
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
64
Comments
1
Likes
6
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. A better Python for the JVM Tobias Ivarsson<tobias@thobe.org> twitter: @thobe blog: http://journal.thobe.org
  • 2. $ whoami tobias (Tobias Ivarsson) • M.Sc. in Computer Science and Engineering from Linköping University, Sweden • Jython Committer / Compiler zealot • Javame why our graph database (Neo4j) kicks ass Ask developer at Neo Technology Check out http://neo4j.org (it works with Python) • High tech. / low traffic: twitter: @thobe blog: http://journal.thobe.org website: http://www.thobe.org - check for slides
  • 3. • Overview of the “Advanced Compiler” project • Performance figures • Python / JVM mismatch • Getting better • Summary
  • 4. • Overview of the “Advanced Compiler” project • Performance figures • Python / JVM mismatch • Getting better • Summary
  • 5. Project motivation • The ultimate goal is a faster Jython • The new compiler is just a component to get there • Focus is on representation of Python code on the JVM
  • 6. What does Code Representation include? • Function/Method/Code object representation • Scopes. How to store locals and globals • Call frame representation • Affects sys._getframe() • The representation of builtins • Mapping of python attributes to the JVM
  • 7. Compiler tool chain AST Source code Parser AST Analyzer Compiler The “spine” of the compiler. The main part. Code Info This is the same in any per scope compiler in Jython, and similar to other systems, CPython in particular, as well.
  • 8. Compiler tool chain AST Source code Parser AST Analyzer Compiler This is the structure of the compiler in Jython Code Info today. per scope Java byte code Jython runtime system JVM
  • 9. Compiler tool chain AST Source code Parser AST Analyzer Compiler IR Transformer Code Info per scope IR The advanced compiler adds t wo more steps to the compilation process. The analyzer and Codegen compiler step also Java Jython change. byte code runtime system JVM
  • 10. Compiler tool chain AST Source code Parser AST Analyzer Compiler IR Transformer Code Info This flexibility makes it per scope possible to output many IR different code formats. Even bundle together multiple Python formats for one module. byte code Codegen Java Jython byte code Interpreter runtime system JVM
  • 11. Compiler tool chain AST Source code Parser AST Analyzer Compiler IR Transformer Code Info It is also possible to compile, per scope and re-compile code with more information from the actual runtime data. Codegen IR IR Java + runtime byte code Jython info Interpreter runtime system JVM
  • 12. The Intermediate Representation • “sea of nodes” style SSA • Control flow and data flow both modeled as edges between nodes • Simplifies instruction re-ordering
  • 13. • Overview of the “Advanced Compiler” project • Performance figures • Python / JVM mismatch • Getting better • Summary
  • 14. Parrotbench • 7 tests, numbered b0-b6 • Test b1 omitted • Tests infinite recursion and expects recursion limit exception • Allocates objects while recursing • Not applicable for Jython
  • 15. Running parrotbench • Python 2.6 vs Jython 2.5 (trunk) • Each test executes 3 times, minimum taken • Total time of process execution, including startup also measured • Jython also tested after JVM JIT warmup • Warmup for about 1 hour... 110 iterations of each test
  • 16. The tests (rough understanding) • b0 parses python in python • b2 computes pi • b3 sorts random data • b4 more parsing of python in python • b5 tests performance of builtins • b6 creates large simple lists/dicts
  • 17. Python 2.6 Test Time (ms) b0 1387 b2 160 b3 943 b4 438 b5 874 b6 1079 Total* (incl.VM startup) 15085 * Total time is for three iterations, other times is the best iteration of those three
  • 18. Jython 2.5b (Preview version available at PyCon) Test Time (ms) Time (ms) (without JIT warmup) (with JIT warmup) b0 4090 2099 b2 202 107 b3 3612 1629 b4 1095 630 b5 3044 2161 b6 2755 2237 Total* (incl.VM startup) 51702 Not applicable * Total time is for three iterations, other times is the best iteration of those three
  • 19. Jython 2.5+ Jython 2.5.0 Final has an embarrassing performance issue on list multiplication that (Snapshot from June 24 2009) got introduced when the list implementation was made thread safe. Test Time (ms) Time (ms) (without JIT warmup) (with JIT warmup) b0 2968 2460 b2 202 124 b3 2255 2030 b4 875 742 b5 4036 2291 b6 2279 2276 Total* (incl.VM startup) 57279 Not applicable * Total time is for three iterations, other times is the best iteration of those three
  • 20. CPython2.6 vs Jython2.5 Work on thread safety and compatibility has made Jython *slower* but better. Performance is a later focus. Python 2.6 Jython 2.5b Jython 2.5+ 60,000 45,000 30,000 15,000 0 Total runtime Excluding VM startup
  • 21. CPython2.6 vs Jython2.5 UnJITed performance improved due to lower call overhead b0 b2 b3 b4 b5 b6 and better dict. JITed performance worse due to thread safety fiixes. 15,000 11,250 7,500 3,750 0 Python 2.6 Jython 2.5b with warmup Jython 2.5+ with warmup
  • 22. CPython2.6 vs Jython2.5 Python 2.6 Jython 2.5b Jython 2.5b with warmup Jython 2.5+ Jython 2.5+ with warmup 5,000 3,750 2,500 1,250 0 b0 b2 b3 b4 b5 b6
  • 23. JRuby is a good indicator for the performance we could reach with Jython. It’s a similar language on the same platform. Therefore a comparison and analysis is interesting. Is JRuby faster than Jython?
  • 24. Adding two numbers # Jython def adder(a,b): return a+b # JRuby def adder(a,b) a+b end
  • 25. Execution times (ms for 400000 additions) Jython JRuby 700ms 697ms 525ms 466ms 350ms 175ms 0ms Without counter
  • 26. Why is JRuby faster? • JRuby has had more work on performance • Jython work has been focused on 2.5 compatibility • Next release will start to target performance • JRuby has a shorter call path • JRuby does Call Site caching
  • 27. Counting the number of additions - Jython from threading import Lock count = 0 lock = Lock() def adder(a,b): global count with lock: count += 1 return a+b
  • 28. Counting the number of additions - JRuby class Counting def adder(a,b) @mutex.synchronize { @count = @count + 1 } a + b end end
  • 29. Execution times (ms for 400000 additions) Jython (Lock) JRuby (Mutex) Jython (AtomicInteger) 50,000ms I included AtomicInteger to 46,960ms verify that the problem was with the synchronization primitives. 37,500ms 25,000ms 12,500ms 4,590ms 0ms 2,981ms With counter
  • 30. Why is JRuby faster? • JRuby has had more work on performance • JRuby has lower call overhead • JRuby Mutex is easier for the JVM to optimize than Jython Lock • Because of JRubys use of closures
  • 31. Call overhead comparison • Python wrapper around • Java code implementing Java primitives the Ruby logic • Call to Python code • Lock • Reflective Java call • Direct call to closure • Lock • Unlock • Execute actual code • Call to Python code • Reflective Java call • Unlock
  • 32. • Overview of the “Advanced Compiler” project • Performance figures • Python / JVM mismatch • Getting better • Summary
  • 33. Call frames • A lot of Python code depend on reflecting call frames • Every JVM has call frames, but only expose them to debuggers • Current Jython is naïve about how frames are propagated • Simple prototyping hints at up to 2x boost
  • 34. Extremely late binding • Every binding can change • The module scope is volatile • Even builtins can be overridden
  • 35. Exception handling • Exception type matching in Python is a sequential comparison. • Exception type matching in the JVM is done on exact type by the VM. • Exception types are specified as arbitrary expressions. • No way of mapping Python try/except directly to the JVM.
  • 36. Blocks of Code • The JVM has a size limit • The JVM JIT has an even smaller size limit
  • 37. • Overview of the “Advanced Compiler” project • Performance figures • Python / JVM mismatch • Getting better • Summary
  • 38. Call frames • Analyze code - omit unnecessary frames • Fall back to java frames for pdb et.al. • Treat locals, globals, dir, exec, eval as special • Pass state - avoid central stored state • sys._getframe() is an implementation detail
  • 39. Late binding • Ignore it and provide a fail path • Inline builtins • Turn for i in range(...): ... into a java loop • Do direct invocations to members of the same module
  • 40. JVM Code analysis • Create faux closures • Separate code blocks that evaluate in same scope • Will also help with the code size limit
  • 41. Exception handling • The same late binding optimizations + optimistic exception handler restructuring gets us far
  • 42. Reaping the fruits of the future JVMs • Invokedynamic can perform most optimistic direct calls and provide the fail path • Interface injection make all java objects look like python objects • Gives improved integration between different dynamic languages even more • The advanced compiler makes a perfect platform for integrating this
  • 43. • Overview of the “Advanced Compiler” project • Performance figures • Python / JVM mismatch • Getting better • Summary
  • 44. The “Advanced Jython compiler” project • Not just a compiler - but everything close to the compiler - code representation • A platform for moving forward • First and foremost an enabling tool • Actual improvement happens elsewhere
  • 45. Performance • Jython has decent performance • On some benchmarks Jython is better • For single threaded applications CPython is still slightly better • Don’t forget: Jython can do threading • Long running applications benefit from the JVM - Jython is for the server side • We are only getting started...
  • 46. Python / JVM mismatch - Getting better - • Most of the problems comes from trying to mimic CPython to closely • Future JVMs are a better match • Break code into smaller chunks • Shorter call paths • Optimistic optimizations are the way to go
  • 47. Thank you! Questions? Tobias Ivarsson <tobias@thobe.org> twitter: @thobe blog: http://journal.thobe.org

×