• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. CSC 310 – Imperative Programming Languages, Spring, 2009 Virtual Machines and Threaded Intermediate Code (instead of PR Chapter 5 on Target Machine Architecture)
  • 2. Virtual Machines and Language Design
    • A VM is an idealization of the run-time processor for a family of source languages.
    • A VM’s instructions, register set, memory access architecture and threading architecture have close mappings to source language requirements.
    • A VM is interpreted via processor simulation.
        • Java compiles to portable Java virtual machine code.
        • Pcode helped popularize Pascal in the 1980’s.
  • 3. Virtual Machine Approaches from Programming Language Pragmatics
    • Pure compilation – optimize target execution at the cost of compile/optimize/link time
    • Pure interpretation – optimize development time and portability
  • 4. Intermediate Code VMs Combine Both Approaches
    • VM provides a portable back end for interpreting programs on different processors.
    • VM provides a portable back end for compiled code generation.
    • VM provides a means for designing a language or processor using simulation.
  • 5. Varieties of Interpreted VMs (A thread is a pointer in this discussion!)
    • Direct interpretation of source code strings.
        • String threading : string -> code mapping and execution via a hash table in early versions of Forth. This approach is slow.
    • Interpretation of a compiler’s abstract syntax tree.
        • This approach is slow and is a poor emulation mechanism.
    • Subroutine threaded virtual machine code.
    • Direct threaded virtual machine code.
        • A sequence of subroutine calls emulate a sequence of machine instructions. Direct threading dispenses with the call instructions.
    • Indirect threaded virtual machine code.
        • Each instruction is a pointer to a pointer to the code.
    • Token threaded code uses non-pointer byte codes.
  • 6. Forth-like Virtual Machine
    • Forth language pioneered threaded intermediate code, extensible compilers, and stack-based postfix virtual machines.
    • Dense code for small footprint systems.
        • Influenced Postscript, JVM, Sparc bootloader.
        • http://www.forth.org/ http://www.forth.com/
        • http://www.complang.tuwien.ac.at/forth/threaded-code.html
        • http://en.wikipedia.org/wiki/Threaded_code#Threading_models See also our course web page for links.
  • 7. Stack-based Threaded Emulated Processor (STEP)
    • A Forth like VM architecture from Bell Labs, 1997, used to control remote execution and debugging of digital signal processors (DSPs).
    • It predates availability of a JVM-ME with tool support (end of 1998 and later).
    • “ STEP: A Stack-based Controller for HDM Tap Managers” gives the VM architecture and implementation in C language.
    • We will build a similar, scaled down implementation of the VM using indirect threaded code in Python.
  • 8. STEP VM Architecture
    • ip is instruction pointer , points to VM machine instructions.
    • data stack holds params and return values, ds is stack pointer.
    • thread stack holds return addresses for ip, ts is stack pointer.
    • dictionary holds compiled code and data.
  • 9. Threaded code has Primitives and Secondaries.
    • A primitive is a sequence of machine code instructions invoked via a call to subroutine, and indirect jump or a doubly indirect jump.
    • A primitive reached via a jump finishes by jumping back to a threaded code interpreter for the VM.
    • A secondary is a sequence of threaded code “opcodes,” possibly intermixed with in-line data. It is typically a subroutine defined in the emulated language. Its opcodes are pointers to primitives, including call instructions for secondaries.
  • 10. Subroutine Threaded Code
    • A threaded code secondary consists of a sequence of subroutine calls in machine code.
        • 1. call push_inline() to push an inline int to data stack
          • 2. an in-line value of 5 (calls intermixed w. in-line data)
          • machine code primitive must adjust return address
        • 3. call push_inline() to push an in-line data addr to stack
          • 4. an in-line data address
        • 5. call fetch() to pop the address and push its data
        • 6. call add_function() to add top 2 stack elements, leaving the sum on the stack
        • 7. call printi to pop and print the int on the data stack
  • 11. Direct Threaded Code
    • Subroutine threaded code without the call opcodes.
        • 1. push_inline to push an inline int to data stack
          • 2. an in-line value of 5 (calls intermixed w. in-line data)
          • machine code primitive must adjust instruction pointer
        • 3. push_inline to push an in-line data addr to stack
          • 4. an in-line data address
        • 5. fetch to pop the address and push its data
        • 6. add_function() to add top 2 stack elements, leaving the sum on the stack
        • 7. printi to pop and print the int on the data stack
    • An inner interpreter fetches and jumps to the subroutines. They jump back to the inner interpreter.
  • 12. Indirect Threaded Code
    • Indirect threaded code adds another level of indirection.
    • A threaded opcode is no longer a pointer to a subroutine. It is a pointer to a pointer to a subroutine.
    • This second pointer resides with data field that were previous in-line.
    • No in-line data with indirect threaded code.
  • 13. Indirect Threaded Code (continued)
    • Code dictionaries are smaller because in-line data are removed.
    • Intern ing constant data allows a single copy of a subroutine-data pair to be allocated.
        • Multiple in-line fetches of a constant value 3, for example, can occur from multiple threaded instructions using a single copy of a fetch_function,3 binding.
    • Indirect threaded code runs faster on architectures that separate code and data such as a Harvard Architecture typical of Digital Signal Processors (DSPs) and Network Processors.
    • Indirect threaded code better supports “in-lining” of nested code expressions, e.g. an in-line lambda expression.
  • 14. Token Threaded Code
    • These are the byte codes of the Java Virtual Machine, for example.
    • These run slower than pointer-based threaded code, but bytes code are portable with recompilation or relocation. Pointer-based threaded code is relative to its run-time process’ address space layout.
    • Mapping byte codes to subroutines (e.g., via hashing) or using them to control switch/case statements are two means of interpretation.
  • 15. Just-in-time Compilation and HotSpot JIT Compilation
    • JIT compilation replaces a secondary with a machine code routine the first time it is interpreted.
    • There is some performance penalty during compilation, and growth in memory usage.
    • Not all secondaries are in the critical path.
    • HotSpot compilation examines statistical analyses on secondaries before compiling them to machine code.