VMs,
Interpreters,
  JIT & Co

   A First
Introduction

Marcus Denker
Overview

 •   Virtual Machines: How and Why
     – Bytecode
     – Garbage Collection (GC)

 •   Code Execution
     –   ...
Caveat


 •   Virtual Machines are not trivial!

 •   This lecture can only give a very high-level overview

 •   You will...
Virtual Machine: What’s that?

 •   Software implementation of a machine

 •   Process VMs
     – Processor emulation (e.g...
High Level Language VM

 •   We focus on HLL VMs
 •   Examples: Java, Smalltalk, Python, Perl....

 •   Three parts:
     ...
The Processor

 •   VM has an instruction set. (virtual ISA)
     – Stack machine
     – Bytecode
     – High level: model...
Garbage Collection

 •   Provides a high level model for memory
 •   No need to explicitly free memory
 •   Different impl...
Other Hardware Abstraction
 •   We need a way to access Operating System APIs
     – Graphics
     – Networking (TCP/IP)
 ...
Virtual Machine: Lessons learned
•   VM: Simulated Machine
•   Consists of
    – Virtual ISA
    – Memory Management
    –...
Bytecode

 •   Byte-encoded instruction set
 •   This means: 256 main instructions
 •   Stack based

 •   Positive:
     –...
Compiling to Bytecode



           Program Text            1 + 2




                     Compiler



                Byt...
Example: Number>>asInteger

 •   Smalltalk code:
      	

 ^1 + 2




 •   Symbolic Bytecode
      <76>      pushConstant:...
Example: Java Bytecode

 •   From class Rectangle
public int perimeter()

0: iconst_2
1: aload_0         “push this”
2: ge...
Difference Java and Squeak

 •   Instruction for arithmetic
     – Just method calls in Smalltalk
 •   Typed: special byte...
Systems Using Bytecode

 •   USCD Pascal (first, ~1970)
 •   Smalltalk
 •   Java
 •   PHP
 •   Python


 •   No bytecode:
 ...
Bytecode: Lessons Learned
•   Instruction set of a VM
•   Stack based
•   Fairly simple

•   Next: Bytecode Execution
Running Bytecode

 •   Invented for Interpretation
 •   But there are faster ways to do it

 •   We will see
     – Interp...
Interpreter

 •   Just a big loop with a case statement
                                      while (1) {
 •   Positive:  ...
Faster: Threaded Code

 •    Idea: Bytecode implementation in memory has
      Address
                                   ...
Next Step: Translation

 •   Idea: Translate bytecode to machine code

                        Bytecode                Bin...
Simple JIT Compiler

 •   JIT = “Just in Time”
 •   On first execution: Generate Code
 •   Need to be very fast
     – No t...
Bytecode Execution: Lessons Learned
•   We have seen
    – Interpreter
    – Threaded Code
    – Simple JIT

•   Next: Opt...
Optimizing Method Lookup

 •   What is slow in OO virtual machines?

 •   Overhead of Bytecode Interpretation
     – Solve...
Example Method Lookup

 •   A simple example:

      array := #(0 1 2 2.5).

      array collect: [:each | each + 1]



 •...
Method Lookup

 •   Need to look this up at runtime:
     – Get class of the receiver
     – if method not found, look in ...
Observation: Yesterday’s Weather

 •   Predict the weather of tomorrow: Same as today
 •   Is right in over 80%

 •   Simi...
Inline Cache

 •   Goal: Make sends faster in many cases
 •   Solution: remember the last lookup

 •   Trick:
     – Inlin...
Inline Cache: First Execution

 First Call:

                                     1) lookup the method



                ...
Inline Cache: First Execution

 First Call:

                                       1) lookup the method



              ...
Inline Cache: First Execution

 First Call:

                                       1) lookup the method
                 ...
Inline Cache: First Execution

 First Call:

                                       1) lookup the method
                 ...
Inline Cache: First Execution

 First Call:

                                       1) lookup the method
                 ...
Inline Cache: Second Execution

 Second Call:
                                     - we jump directly to the test
        ...
Limitation of Simple Inline Caches

 •   Works nicely at places were only one method is
     called. “Monomorphic sends”. ...
Example Polymorphic Send

 •   This example is Polymorphic.

      array := #(1 1.5 2 2.5 3 3.5).

      array collect: [:...
Polymorphic Inline Caches

 •   Solution: When inline cache fails, build up a PIC

 •   Basic idea:
     – For each receiv...
PIC


                                                check receiver
                                                type
...
PIC

 •   PICs solve the problem of Polymorphic sends
 •   Three steps:
     – Normal Lookup / generate IC
     – IC looku...
Optimizations: Lessons Learned
•   We have seen
    – Inline Caches
    – Polymorphic Inline Caches

•   Next: Beyond Just...
Beyond JIT: Dynamic Optimization

 •   Just-In-Time == Not-Enough-Time
     – No complex optimization possible
     – No w...
Excursion: Optimizations

 •   Or: Why is a modern compiler so slow?

 •   There is a lot to do to generate good code!
   ...
State of the Art: HotSpot et. al.

 •   Pioneered in Self
 •   Use multiple compilers
     – Fast but bad code
     – Slow...
PIC Data for Inlining

•   Use type information from PIC for specialisation
•   Example: Class Point

                    ...
Example Inlining

   •   The PIC will be build



                               If type = CarthesianPoint
               ...
Example Inlining

 •   We can inline code for all known cases



       ...
       if type = cartesian point
       
 resu...
End


•   What is a VM
•   Overview about bytecode interpreter / compiler
•   Inline Caching / PICs
•   Dynamic Optimizati...
Literature
 •   Smith/Nair:Virtual Machines (Morgan Kaufman
     August 2005). Looks quite good!

 •   For PICs:
     – Ur...
Upcoming SlideShare
Loading in...5
×

VMs, Interpreters, JIT

1,491

Published on

Lecture: "VMs, Interpreters, JIT", DCC University of Chile , October 2005

Published in: Technology
0 Comments
12 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,491
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
69
Comments
0
Likes
12
Embeds 0
No embeds

No notes for slide

VMs, Interpreters, JIT

  1. 1. VMs, Interpreters, JIT & Co A First Introduction Marcus Denker
  2. 2. Overview • Virtual Machines: How and Why – Bytecode – Garbage Collection (GC) • Code Execution – Interpretation – Just-In-Time – Optimizing method lookup – Dynamic optimization Marcus Denker
  3. 3. Caveat • Virtual Machines are not trivial! • This lecture can only give a very high-level overview • You will not be a VM Hacker afterwards ;-) • Many slides of this talk should be complete lectures (or even courses) Marcus Denker
  4. 4. Virtual Machine: What’s that? • Software implementation of a machine • Process VMs – Processor emulation (e.g. run PowerPC on Intel) • FX32!, MacOS 68k, powerPC – Optimization: HP Dynamo – High level language VMs • System Virtual Machines – IBM z/OS – Virtual PC Marcus Denker
  5. 5. High Level Language VM • We focus on HLL VMs • Examples: Java, Smalltalk, Python, Perl.... • Three parts: – The Processor • Simulated Instruction Set Architecture (ISA) – The Memory: Garbage Collection – Abstraction for other OS / Hardware (e.g, I/O) • Very near to the implemented language • Provides abstraction from OS Marcus Denker
  6. 6. The Processor • VM has an instruction set. (virtual ISA) – Stack machine – Bytecode – High level: models the language quite directly • e.g, “Send” bytecode in Smalltalk • VM needs to run this Code • Many designs possible: – Interpreter (simple) – Threaded code – Simple translation to machine code – Dynamic optimization (“HotSpot”) Marcus Denker
  7. 7. Garbage Collection • Provides a high level model for memory • No need to explicitly free memory • Different implementations: – Reference counting – Mark and sweep – Generational GC • A modern GC is *very* efficient • It’s hard to do better Marcus Denker
  8. 8. Other Hardware Abstraction • We need a way to access Operating System APIs – Graphics – Networking (TCP/IP) – Disc / Keyboard.... • Simple solution: – Define an API for this Hardware – Implement this as part of the VM (“Primitives”) – Call OS library functions directly Marcus Denker
  9. 9. Virtual Machine: Lessons learned • VM: Simulated Machine • Consists of – Virtual ISA – Memory Management – API for Hardware/OS • Next: Bytecode
  10. 10. Bytecode • Byte-encoded instruction set • This means: 256 main instructions • Stack based • Positive: – Very compact – Easy to interpret • Important: The ISA of a VM does not need to be Bytecode. Marcus Denker
  11. 11. Compiling to Bytecode Program Text 1 + 2 Compiler Bytecode 76 77 B0 7C Marcus Denker
  12. 12. Example: Number>>asInteger • Smalltalk code: ^1 + 2 • Symbolic Bytecode <76> pushConstant: 1 <77> pushConstant: 2 <B0> send: + <7C> returnTop Marcus Denker
  13. 13. Example: Java Bytecode • From class Rectangle public int perimeter() 0: iconst_2 1: aload_0 “push this” 2: getfield#2 “value of sides” 5: iconst_0 6: iaload 7: aload_0 2*(sides[0]+sides[1]) 8: getfield#2 11: iconst_1 12: iaload 13: iadd 14: imul 15: ireturn Marcus Denker
  14. 14. Difference Java and Squeak • Instruction for arithmetic – Just method calls in Smalltalk • Typed: special bytecode for int, long, float • Bytecode for array access • Not shown: – Jumps for loops and control structures – Exceptions (java) – .... Marcus Denker
  15. 15. Systems Using Bytecode • USCD Pascal (first, ~1970) • Smalltalk • Java • PHP • Python • No bytecode: – Ruby: Interprets the AST Marcus Denker
  16. 16. Bytecode: Lessons Learned • Instruction set of a VM • Stack based • Fairly simple • Next: Bytecode Execution
  17. 17. Running Bytecode • Invented for Interpretation • But there are faster ways to do it • We will see – Interpreter – Threaded code – Translation to machine code (JIT, Hotspot) Marcus Denker
  18. 18. Interpreter • Just a big loop with a case statement while (1) { • Positive: bc = *ip++; – Memory efficient switch (bc) { – Very simple ... case 0x76: – Easy to port to new machines *++sp = ConstOne; break; • Negative: ... } – Slooooooow... } Marcus Denker
  19. 19. Faster: Threaded Code • Idea: Bytecode implementation in memory has Address Bytecode Addresses push1: *++sp = ConstOne translate goto *ip++; push2: • Pro: *++sp = ConstTwo – Faster goto *ip++; – Still fairly simple – Still portable – Used (and pioneered) in Forth. – Threaded code can be the ISA of the VM Marcus Denker
  20. 20. Next Step: Translation • Idea: Translate bytecode to machine code Bytecode Binary Code • Pro: translate – Faster execution – Possible to do optimizations – Interesting tricks possible: Inline Caching (later) • Negative – Complex – Not portable across processors – Memory Marcus Denker
  21. 21. Simple JIT Compiler • JIT = “Just in Time” • On first execution: Generate Code • Need to be very fast – No time for many code optimizations • Code is cached (LRU) • Memory: – Cache + implementation JIT in RAM. – We trade memory for performance. Marcus Denker
  22. 22. Bytecode Execution: Lessons Learned • We have seen – Interpreter – Threaded Code – Simple JIT • Next: Optimizations
  23. 23. Optimizing Method Lookup • What is slow in OO virtual machines? • Overhead of Bytecode Interpretation – Solved with JIT • Method Lookup – Java: Polymorph – Smalltalk: Polymorph and dynamically typed – Method to call can only be looked up at runtime Marcus Denker
  24. 24. Example Method Lookup • A simple example: array := #(0 1 2 2.5). array collect: [:each | each + 1] • The method “+” executed depends on the receiving object Marcus Denker
  25. 25. Method Lookup • Need to look this up at runtime: – Get class of the receiver – if method not found, look in superclass Code for lookup (in the VM) .... lookup(obj, sel) .... Code of Method Found Binary Code (generated by JIT) Marcus Denker
  26. 26. Observation: Yesterday’s Weather • Predict the weather of tomorrow: Same as today • Is right in over 80% • Similar for method lookup: – Look up method on first execution of a send – The next lookup would likely get the same method • True polymorphic sends are seldom Marcus Denker
  27. 27. Inline Cache • Goal: Make sends faster in many cases • Solution: remember the last lookup • Trick: – Inline cache (IC) – In the binary code of the sender Marcus Denker
  28. 28. Inline Cache: First Execution First Call: 1) lookup the method .... lookup(obj, sel) .... Binary Code (generated by JIT) Marcus Denker
  29. 29. Inline Cache: First Execution First Call: 1) lookup the method .... lookup(obj, sel) .... Code of Method Binary Code (generated by JIT) Marcus Denker
  30. 30. Inline Cache: First Execution First Call: 1) lookup the method 2) generate test preamble .... lookup(obj, sel) .... check receiver type Code of Method Binary Code (generated by JIT) Marcus Denker
  31. 31. Inline Cache: First Execution First Call: 1) lookup the method 2) generate test preamble 3) patch calling method .... call preamble .... check receiver type Code of Method Binary Code (generated by JIT) Marcus Denker
  32. 32. Inline Cache: First Execution First Call: 1) lookup the method 2) generate test preamble 3) patch calling method 4) execute method and return .... call preamble .... check receiver type Code of Method Binary Code (generated by JIT) Marcus Denker
  33. 33. Inline Cache: Second Execution Second Call: - we jump directly to the test - If test is ok, execute the method - If not, do normal lookup .... call preamble .... wrong check receiver normal type lookup Code of Method Binary Code (generated by JIT) Marcus Denker
  34. 34. Limitation of Simple Inline Caches • Works nicely at places were only one method is called. “Monomorphic sends”. >80% • How to solve it for the rest? • Polymorphic sends (<15%) – <10 different classes • Megamophic sends (<5%) – >10 different classes Marcus Denker
  35. 35. Example Polymorphic Send • This example is Polymorphic. array := #(1 1.5 2 2.5 3 3.5). array collect: [:each | each + 1] • Two classes: Integer and Float • Inline cache will fail at every send • It will be slower than doing just a lookup! Marcus Denker
  36. 36. Polymorphic Inline Caches • Solution: When inline cache fails, build up a PIC • Basic idea: – For each receiver, remember the method found – Generate stub to call the correct method Marcus Denker
  37. 37. PIC check receiver type Code of Method If type = Integer (Integer) .... jump call pic If type = Float .... jump else call lookup check receiver type Binary Code Code of Method (generated by JIT) (Float) Marcus Denker
  38. 38. PIC • PICs solve the problem of Polymorphic sends • Three steps: – Normal Lookup / generate IC – IC lookup, if fail: build PIC – PIC lookup • PICs grow to a fixed size (~10) • After that: replace entries • Megamorphic sends: – Will be slow – Some systems detect them and disable caching Marcus Denker
  39. 39. Optimizations: Lessons Learned • We have seen – Inline Caches – Polymorphic Inline Caches • Next: Beyond Just-In-Time
  40. 40. Beyond JIT: Dynamic Optimization • Just-In-Time == Not-Enough-Time – No complex optimization possible – No whole-program-optimization • We want to do real optimizations! Marcus Denker
  41. 41. Excursion: Optimizations • Or: Why is a modern compiler so slow? • There is a lot to do to generate good code! – Transformation in a good intermediate form (SSA) – Many different optimization passes • Constant subexpression elimination (CSE) • Dead code elimination • Inlining – Register allocation – Instruction selection Marcus Denker
  42. 42. State of the Art: HotSpot et. al. • Pioneered in Self • Use multiple compilers – Fast but bad code – Slow but good code • Only invest time were it really pays off • Here we can invest some more • Problem:Very complicated, huge warmup, lots of Memory • Examples: Self, Java Hotspot, Jikes Marcus Denker
  43. 43. PIC Data for Inlining • Use type information from PIC for specialisation • Example: Class Point CarthesianPoint>>x ^x ... lookup(rec, sel) .... PolarPoint>>x “compute x from rho and theta” Binary Code (generated by JIT) Marcus Denker
  44. 44. Example Inlining • The PIC will be build If type = CarthesianPoint jump The PIC contains ... If type = PolarPoint call pic jump type Information! .... else call lookup Binary Code (generated by JIT) Marcus Denker
  45. 45. Example Inlining • We can inline code for all known cases ... if type = cartesian point result ← receiver.x else if type = polar point result ← receiver.rho * cos(receiver.theta) else call lookup ... Binary Code (generated by JIT) Marcus Denker
  46. 46. End • What is a VM • Overview about bytecode interpreter / compiler • Inline Caching / PICs • Dynamic Optimization • Questions?
  47. 47. Literature • Smith/Nair:Virtual Machines (Morgan Kaufman August 2005). Looks quite good! • For PICs: – Urs Hölzle, Craig Chambers, David Ungar: Optimizing Dynamically-Typed Object-Oriented Languages With Polymorphic Inline Caches – Urs Hoelzle. "Adaptive Optimization for Self: Reconciling High Performance with Exploratory Programming." Ph.D. thesis, Computer Science Department, Stanford University, August 1994. Marcus Denker
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×