A better Python
  for the JVM
Tobias Ivarsson <tobias@thobe.org>
Hello, my name is...

• ...Tobias Ivarsson
• Jython Committer / Compiler geek
• Java developer at Neo Technology
  Ask me about our graph database - Neo4j
  (it works with Python)
• Overview of the “Advanced Compiler”
  project

• Performance figures
• Python / JVM mismatch
• Getting better
• Summary
• Overview of the “Advanced Compiler”
  project

• Performance figures
• Python / JVM mismatch
• Getting better
• Summary
Project motivation

• The ultimate goal is a faster Jython
• The new compiler is just a component to
  get there
• Focus is on representation of Python code
  on the JVM
What does Code
Representation include?
• Function/Method/Code object
  representation
• Call frame representation
 • Affects sys._getframe()
• Scopes. How to store locals and globals
• The representation of builtins
• Mapping from python attributes to the JVM
Compiler tool chain
                                                AST
Source code   Parser   AST   Analyzer                       Compiler
                                            Code Info
                                            per scope


                                The “spine” of the
                                compiler. The main part.
                                This is the same in any
                                compiler in Jython, and
                                similar to other systems,
                                CPython in particular, as
                                well.
Compiler tool chain
                                          AST
Source code   Parser   AST   Analyzer               Compiler     This is the structure of
                                                                 the compiler in Jython
                                        Code Info                today.
                                        per scope


                                                        Java
                                                     byte code


                  Jython
                 runtime
                  system                                     JVM
Compiler tool chain
                                          AST
Source code   Parser   AST   Analyzer               Compiler            IR   Transformer
                                        Code Info
                                        per scope
                                                                                  IR
                                                The advanced compiler
                                                adds t wo more steps
                                                to the compilation
                                                process.
                                                The analyzer and
                                                                              Codegen
                                                compiler step also
                                                                                   Java
                  Jython                        change.
                                                                                byte code

                 runtime
                  system                                           JVM
Compiler tool chain
                                                   AST
Source code        Parser       AST   Analyzer               Compiler      IR   Transformer
                                                 Code Info
This flexibility makes it                         per scope
possible to output many                                                              IR
different code formats.
Even bundle together multiple                                   Python
formats for one module.
                                                               byte code
                                                                                 Codegen
                                                                                      Java
                            Jython                                                 byte code
                                                             Interpreter
                           runtime
                            system                                     JVM
The Intermediate
     Representation

• “sea of nodes” style SSA
 • Control flow and data flow both
    modeled as edges between nodes
 • Simplifies instruction re-ordering
• Overview of the “Advanced Compiler”
  project

• Performance figures
• Python / JVM mismatch
• Getting better
• Summary
Parrotbench
• 7 tests, numbered b0-b6
• Test b1 omitted
 • Tests infinite recursion and expects
    recursion limit exception
 • Allocates objects while recursing
 • Not applicable for Jython
Running parrotbench
• Python 2.6 vs Jython 2.5 (trunk)
• Each test executes 3 times, minimum taken
• Total time of process execution, including
  startup also measured
• Jython also tested after JVM JIT warmup
 • Warmup for about 1 hour...
    110 iterations of each test
The tests
     (rough understanding)
• b0 parses python in python
• b2 computes pi
• b3 sorts random data
• b4 more parsing of python in python
• b5 tests performance of builtins
• b6 creates large simple lists/dicts
Python 2.6
        Test              Time (ms)
         b0                            1387
         b2                             160
         b3                             943
         b4                             438
         b5                             874
         b6                            1079
Total (incl.VM startup)               15085
Jython 2.5 (trunk)
        Test                Time (ms)            Time (ms)
                          (without JIT warmup)   (with JIT warmup)

         b0                            4090                 2099
         b2                             202                   107
         b3                            3612                 1629
         b4                            1095                   630
         b5                            3044                 2161
         b6                            2755                 2237
Total (incl.VM startup)              51702 Not applicable
CPython2.6 vs Jython2.5
           Python 2.6      Jython 2.5

60,000


45,000


30,000


15,000


    0
         Total runtime   Excluding VM startup
CPython2.6 vs Jython2.5
         b0           b2   b3           b4     b5        b6

15,000


11,250


 7,500


 3,750


    0
              Python 2.6        Jython 2.5   Jython with warmup
CPython2.6 vs Jython2.5
        Python 2.6        Jython 2.5        Jython with warmup

5,000


3,750


2,500


1,250


   0
         b0          b2      b3        b4         b5       b6
What about the
“Advanced Compiler”
• So far no speedup compared to the “old
  compiler”
• Slight slowdown due to extra compiler step
• Does provide a platform for adding
  optimizations
 • But none of these are implemented yet...
• Overview of the “Advanced Compiler”
  project

• Performance figures
• Python / JVM mismatch
• Getting better
• Summary
Call frames
• A lot of Python code depend on reflecting
  call frames
• Every JVM has call frames, but only expose
  them to debuggers
• Current Jython is naïve about how frames
  are propagated
  •   Simple prototyping hints at up to 2x boost
Extremely late binding

• Every binding can change
• The module scope is volatile
 • Even builtins can be overridden
Exception handling
• Exception type matching in Python is a
  sequential comparison.
• Exception type matching in the JVM is done
  on exact type by the VM.
• Exception types are specified as arbitrary
  expressions.
  • No way of mapping Python try/except
    directly to the JVM.
• Overview of the “Advanced Compiler”
  project

• Performance figures
• Python / JVM mismatch
• Getting better
• Summary
Call frames

• Analyze code - omit unnecessary frames
• Fall back to java frames for pdb et.al.
• Treat locals, globals, dir, exec, eval as special
• Pass state - avoid central stored state
• sys._getframe() is an implementation detail
Late binding

• Ignore it and provide a fail path
 • Inline builtins
    • Turn for i in range(...): ... into a java loop
 • Do direct invocations to members of the
     same module
Exception handling


• The same late binding optimizations
  + optimistic exception handler
  restructuring gets us far
Reaping the fruits of
    the future JVMs
• Invokedynamic can perform most optimistic
  direct calls and provide the fail path
• Interface injection makes java objects look
  like python objects
  • And improves integration between
    different dynamic languages even more
• The advanced compiler makes a perfect
  platform for integrating this
• Overview of the “Advanced Compiler”
  project

• Performance figures
• Python / JVM mismatch
• Getting better
• Summary
The “Advanced Jython
  compiler” project
• Not just a compiler - but everything close
  to the compiler - code representation
• A platform for moving forward
 • First and foremost an enabling tool
 • Actual improvement happens elsewhere
Performance
• Jython has decent performance
• On some benchmarks Jython is better
• For most “real applications” CPython is
  better
• Long running applications benefit from the
  JVM - Jython is for the server side
• We are only getting started...
Python / JVM mismatch
   - Getting better -

• Most of the problems comes from trying to
  mimic CPython to closely
• Future JVMs are a better match
• Optimistic optimizations are the way to go
Thank you!

Questions?
         Tobias Ivarsson
      <tobias@thobe.org>

A Better Python for the JVM

  • 1.
    A better Python for the JVM Tobias Ivarsson <tobias@thobe.org>
  • 2.
    Hello, my nameis... • ...Tobias Ivarsson • Jython Committer / Compiler geek • Java developer at Neo Technology Ask me about our graph database - Neo4j (it works with Python)
  • 3.
    • Overview ofthe “Advanced Compiler” project • Performance figures • Python / JVM mismatch • Getting better • Summary
  • 4.
    • Overview ofthe “Advanced Compiler” project • Performance figures • Python / JVM mismatch • Getting better • Summary
  • 5.
    Project motivation • Theultimate goal is a faster Jython • The new compiler is just a component to get there • Focus is on representation of Python code on the JVM
  • 6.
    What does Code Representationinclude? • Function/Method/Code object representation • Call frame representation • Affects sys._getframe() • Scopes. How to store locals and globals • The representation of builtins • Mapping from python attributes to the JVM
  • 7.
    Compiler tool chain AST Source code Parser AST Analyzer Compiler Code Info per scope The “spine” of the compiler. The main part. This is the same in any compiler in Jython, and similar to other systems, CPython in particular, as well.
  • 8.
    Compiler tool chain AST Source code Parser AST Analyzer Compiler This is the structure of the compiler in Jython Code Info today. per scope Java byte code Jython runtime system JVM
  • 9.
    Compiler tool chain AST Source code Parser AST Analyzer Compiler IR Transformer Code Info per scope IR The advanced compiler adds t wo more steps to the compilation process. The analyzer and Codegen compiler step also Java Jython change. byte code runtime system JVM
  • 10.
    Compiler tool chain AST Source code Parser AST Analyzer Compiler IR Transformer Code Info This flexibility makes it per scope possible to output many IR different code formats. Even bundle together multiple Python formats for one module. byte code Codegen Java Jython byte code Interpreter runtime system JVM
  • 11.
    The Intermediate Representation • “sea of nodes” style SSA • Control flow and data flow both modeled as edges between nodes • Simplifies instruction re-ordering
  • 12.
    • Overview ofthe “Advanced Compiler” project • Performance figures • Python / JVM mismatch • Getting better • Summary
  • 13.
    Parrotbench • 7 tests,numbered b0-b6 • Test b1 omitted • Tests infinite recursion and expects recursion limit exception • Allocates objects while recursing • Not applicable for Jython
  • 14.
    Running parrotbench • Python2.6 vs Jython 2.5 (trunk) • Each test executes 3 times, minimum taken • Total time of process execution, including startup also measured • Jython also tested after JVM JIT warmup • Warmup for about 1 hour... 110 iterations of each test
  • 15.
    The tests (rough understanding) • b0 parses python in python • b2 computes pi • b3 sorts random data • b4 more parsing of python in python • b5 tests performance of builtins • b6 creates large simple lists/dicts
  • 16.
    Python 2.6 Test Time (ms) b0 1387 b2 160 b3 943 b4 438 b5 874 b6 1079 Total (incl.VM startup) 15085
  • 17.
    Jython 2.5 (trunk) Test Time (ms) Time (ms) (without JIT warmup) (with JIT warmup) b0 4090 2099 b2 202 107 b3 3612 1629 b4 1095 630 b5 3044 2161 b6 2755 2237 Total (incl.VM startup) 51702 Not applicable
  • 18.
    CPython2.6 vs Jython2.5 Python 2.6 Jython 2.5 60,000 45,000 30,000 15,000 0 Total runtime Excluding VM startup
  • 19.
    CPython2.6 vs Jython2.5 b0 b2 b3 b4 b5 b6 15,000 11,250 7,500 3,750 0 Python 2.6 Jython 2.5 Jython with warmup
  • 20.
    CPython2.6 vs Jython2.5 Python 2.6 Jython 2.5 Jython with warmup 5,000 3,750 2,500 1,250 0 b0 b2 b3 b4 b5 b6
  • 21.
    What about the “AdvancedCompiler” • So far no speedup compared to the “old compiler” • Slight slowdown due to extra compiler step • Does provide a platform for adding optimizations • But none of these are implemented yet...
  • 22.
    • Overview ofthe “Advanced Compiler” project • Performance figures • Python / JVM mismatch • Getting better • Summary
  • 23.
    Call frames • Alot of Python code depend on reflecting call frames • Every JVM has call frames, but only expose them to debuggers • Current Jython is naïve about how frames are propagated • Simple prototyping hints at up to 2x boost
  • 24.
    Extremely late binding •Every binding can change • The module scope is volatile • Even builtins can be overridden
  • 25.
    Exception handling • Exceptiontype matching in Python is a sequential comparison. • Exception type matching in the JVM is done on exact type by the VM. • Exception types are specified as arbitrary expressions. • No way of mapping Python try/except directly to the JVM.
  • 26.
    • Overview ofthe “Advanced Compiler” project • Performance figures • Python / JVM mismatch • Getting better • Summary
  • 27.
    Call frames • Analyzecode - omit unnecessary frames • Fall back to java frames for pdb et.al. • Treat locals, globals, dir, exec, eval as special • Pass state - avoid central stored state • sys._getframe() is an implementation detail
  • 28.
    Late binding • Ignoreit and provide a fail path • Inline builtins • Turn for i in range(...): ... into a java loop • Do direct invocations to members of the same module
  • 29.
    Exception handling • Thesame late binding optimizations + optimistic exception handler restructuring gets us far
  • 30.
    Reaping the fruitsof the future JVMs • Invokedynamic can perform most optimistic direct calls and provide the fail path • Interface injection makes java objects look like python objects • And improves integration between different dynamic languages even more • The advanced compiler makes a perfect platform for integrating this
  • 31.
    • Overview ofthe “Advanced Compiler” project • Performance figures • Python / JVM mismatch • Getting better • Summary
  • 32.
    The “Advanced Jython compiler” project • Not just a compiler - but everything close to the compiler - code representation • A platform for moving forward • First and foremost an enabling tool • Actual improvement happens elsewhere
  • 33.
    Performance • Jython hasdecent performance • On some benchmarks Jython is better • For most “real applications” CPython is better • Long running applications benefit from the JVM - Jython is for the server side • We are only getting started...
  • 34.
    Python / JVMmismatch - Getting better - • Most of the problems comes from trying to mimic CPython to closely • Future JVMs are a better match • Optimistic optimizations are the way to go
  • 35.
    Thank you! Questions? Tobias Ivarsson <tobias@thobe.org>