A Virtual Machine for an Object-Oriented Programming Language
Thet Khine, Khine Moe Nwe
University of Computer Studies, Yangon
mrthetkhine@gmail.com
Abstract
Modern programming
languages such as Java [1] , C# [7] produce
platform independent bytecode representation of
their programs that can be run on top of a Virtual
Machine. Virtual Machine executes the bytecode
of their corresponding programming language.
Virtual Machine implements the instruction set
defined by their language. They also handle
various runtime managements such as liking,
loading, memory allocation and deallocation etc.
SOOL (Simple Object-Oriented Language) is a
simple, modern, Object-Oriented programming
language with static and strong type system,
modern language construct for creation of Design
Pattern. The compiler for SOOL produces the
bytecode that can be run on its own virtual
machine UVM (Unified Virtual Machine). This
paper presents the design, implementation method
for the UVM used by the programming language
SOOL. The design of UVM is mainly inspired by
JVM [8].Current implementation of UVM can be
run on Window platform.
Keywords: programming language
implementation, runtime environment, virtual
machine, Object-Oriented programming.
1. Introduction
Modern programming languages Java [1]
and C# [7] does not directly produce native
machine code like C++ [11] that can be directly
executed on the hardware machine. Instead, they
produce platform independent representation of
their program that can be executed by means of an
abstract machine or virtual machine. Virtual
machine approach to language implementation
offers many advantages over producing machine
code, it is easier for the compiler writer to
produce bytecode representation of the program
than machine code, programs can be platform
independent, multiple language interoperability
etc.
SOOL is a simple, modern, Object-
Oriented programming language with modern
construct for the creation of Design Pattern.
Design Patterns are described in [5]. It offers full
capabilities of O-O language such as class,
inheritance, interface, virtual method, exception
handling. Rebindable method, free class, adapter
clause, automatic delegation, singleton class are
introduced in the language. SOOL compiler
produces the bytecode representation of its
program. The bytecode is run on the UVM, a
virtual machine designed to run SOOL programs.
Virtual machine can be either stack based
or register based. Stack based virtual machine
operates all of its operation on the expression
stack or operand stack, for example, adding two
integer consists the following operations, pop two
operand stack from the operand stack, add them,
push the result back onto the operand stack. Java
runs on JVM and C# runs on top of CLR, both of
them are stack based virtual machine. UVM is
also a stack based virtual machine; code
generation process is simpler in stack based
virtual machine than register based one.
Virtual machine can execute their
instruction in various ways. One the most three
popular options are pure interpretation, AOT
(Ahead of Time compilation), JIT (Just in Time
compilation). Pure bytecode interpreter are easy to
implements and portable but they have poor
performance over AOT or JIT. Bytecode
interpreter fetches an instruction, decodes it and
execute the action for the instruction. AOT
compiler compiles the bytecode into the machine
executable code In JIT compiler, the bytecode is
translated into native machine instruction on the
fly when the program is running and the resulting
native instructions are cached, after the bytecode
is executed for the next time, the cached native
instruction are executed A JIT compiler are the
most popular approach and can give better
performance over the pure interpretation but
requires complex runtime implementation and
method.
The goal of UVM is to provide a simple
and elegant virtual machine that can execute
SOOL programs with smallest possible code size
and module that can easily be replaced. The
solution must be simple and small so that it can be
studied easily by undergraduate student level
without requiring too much effort. It design
encourages student to experiments in language
implementation and virtual machine design more
easily than other commercial or open source
virtual machine. UVM used Indirect-Threaded
Interpreter for its execution engine. UVM is
written in C++. UVM contains 111 instruction set
and 5000 lines of code for its implementation.
Java programming language has 256 instructions
set.
2. Related Work
Smalltalk [4] is a simple pure Object-
Oriented language with dynamic type system that
can be run on Smalltalk virtual machine.
Smalltalk virtual machine is presented in [4], it is
a stack based machine. Smalltalk uses primitive
methods that are similar to native method of
SOOL programming language. Two major
component of the Smalltalk virtual machine are
bytecode interpreter and object memory.
Bytecode interpreter execute the instruction of
Smalltalk, the function of object memory is to
create, store and destroy objects, and to provide
access to their fields.
JVM [8] have many characteristic in
common with UVM, many of design issues are
inspired by JVM. JVM is presented in [8]. All
Java program must be verified to be correct before
they can be executed, UVM does not need such
facilities because it is intend to be used in
academic domain.
Microsoft .NET CLI is presented in
ECMA 335 standard [13]. Unlike UVM and JVM,
it support multiple programming language, all
.NET programming language are compiled into
MSIL (Microsoft Intermediate language). CLI is
also a stack based machine. CTS (common type
system) of CLI enable multi language
interoperability. CLI implementations are
employed with JIT compiler rather than
interpreter due to their nature of instruction set.
Current modern programming language
[10] does not construct their only runtime system;
instead they produce bytecode for JVM or CLR.
There are many advantages of this approach. First
there exist many libraries for JVM or CLR, those
libraries can be integrated with new language, and
language designer is not worried to implement the
library or virtual machine. We do not choose this
approach because our goal is to provide not only a
programming language and implementation but
also a educational framework that must be easy
for study by undergraduate level. Modern virtual
machines are sophisticated and complex and
difficult to use as an educational tools by student
level. Scala [10] compiler compiled the Scala
language into Java bytecode.
3. Background Theory
3.1 The SOOL programming language
The SOOL programming language is a
simple Object-Oriented, statically and strongly
typed, general purpose language enhanced with
special language constructs for creation of Design
Pattern. SOOL is designed to provide a complete
modern O-O characteristic while keeping its
implementation compact and small. SOOL
compiler produces an ucode file (universal code)
for each of the class in the compilation unit.
Ucode file is a platform independent
representation of the SOOL programs. SOOL
programming language offers exception handling
mechanism for modular error detection and
control. Method of the same class can vary their
implementation at runtime in SOOL with the help
of rebindable method and free class, automatic
delegation mechanism is provided that can be
used to model Object Adapter, dynamic
inheritance and other O-O common idioms.
Dynamic linking model is employed in the
language.
3.2 Unified Virtual Machine
The Unified Virtual Machine is an
abstract machine responsible for the execution of
the SOOL programs. It design is inspired by
modern language runtime, mainly form JVM [8].
UVM is a stack based machine that operates its
operation on top of operand stack. The maximum
size of operand stack required by a method is
computed at compile time. UVM manage various
runtime managements such as memory
management, exception handling, and object
creation. Unlike JVM and CLI, UVM has no
multithreading capabilities, so there is only one
method call stack for execution of SOOL
programs that makes it easier and simpler to
implement the UVM. Bytecode verification [6]
are not employed in the UVM, security
constraints are not defined by the SOOL
programming language so the UVM has no
support for security mechanism at virtual machine
level.
3.3 Ucode file
Ucode files are produced by the SOOL
compiler. They are very similar to the .class file of
Java except the following difference .Java class
file can contain custom attribute whereas ucode
file does not, Java class file can contain
debugging information but ucode file does not
employ debugging information. The following
give the structure of ucode file. Ucode file is
stored in network byte order format. Size is
measured in byte, for example u4 means unsigned
4 byte.
u4 MAGIC_CODE
u1 Version
u2 ConstantPool_Count
ConstantPoolEntry [ ConstantPool_Count]
u2 classModifier
u2 thisClassIndex
u2 superClassIndex
u2 interface_count
u2 interfaces[interface_count]
u2 field_count
Field [field_count]
u2 method_count
Method [method_count]
3.3.1 Constant pool
Constant pool store various constant used by
the program, for example string constant, integer
constant, symbolic reference to class, field and
method. These symbolic references are resolved
into native reference by the UVM at runtime.
Because SOOL employs dynamic linking model
references to external class, field, and method are
stored with symbolic constant and they are
resolved only when needed by the runtime
system. Constant pool can be one of the following
entries.
1. String constant entry
2. Integer constant entry
3. Long constant entry
4. Float constant entry
5. Double constant entry
6. Class entry
7. Method reference entry
8. Interface method reference entry
9. Field reference entry
3.3.2 Field
Field are encode in the following format.
u2 field modifier
u2 field name index to constant pool
u2 field type index to constant pool
Field modifier is integer representing of
various attribute of the field, for example private,
static. Field name index is the index to constant
pool pointing to string entry representing the
name of the field. Field type index points to
constant pool string entry representing the type of
field.
3.3.3 Method
Method are encode in the following format.
u2 modifier
u2 method name index
u2 method signature index
u2 size of argument
u2 size of local variable
u2 maximum size of operand stack
u2 no of exception table
u2 method code length
u1 bytecode[ method code length ]
Exception Table[ no of exception table]
Method modifier is integer representing
various attribute of the method such as private,
public, protected, abstract, final, rebindable.
Method name index points to constant pool string
entry representing the name of the method.
Method signature index points to constant pool
string entry representing the signature of the
method. Argument size is used in method call
statements to determine how much parameter
must be passed to the calling method. Size of
local variable denotes the no of local variable
need by the method measured in word size.
Maximum size of operand stack defines the
maximum depth of operand stack when execution
of the method measured in word size. No of
exception table is the no of catch statement in the
method to handle exception.
3.3.4 Exception table
u2 from code offset
u2 to code offset
u2 target offset of exception handler
u2 index of constant pool to catch exception
Exception table define catch statement of
the program. From code offset is the starting
offset of the try statement, to code offset is the
ending offset of the try statement, target offset of
exception handler gives offset of the catch
statement code that handles the exception. The
last index point to constant pool class entry
representing the exception type the catch
statement wants to handle.
3.5 Type descriptor
Ucode file stores type name of the
variable, field, class, method return type, method
signature with textual representation. The
following is used to represent type name.
string m
byte b
short s
int i
long l
float f
double d
boolean t
char c
Class name of the Class
Arrays are denoted as follows. Array of
integer are stored as [i, array of Human class are
stored as [Human. Multidimensional arrays are
stored as prefixing the bracket with same no as
dimension of array. For example two dimensional
array of Object will be stored as [[Object.
3.6 Instruction set
UVM has 111 instruction set for load and
store operation, arithmetic, logical, relation, jump
and control transfer, object creation, method call,
field access, array creation and array element
access, exception handling, conversion, rebind
statement, instance of test statement. Complete
instruction set is not presented in this paper due to
space limitation.
3.7 Instruction format
There are three type of instruction
format. They are followings.
3.7.1 Instruction format one
Instruction format one only consist of
one byte opcode, many instruction of UVM used
this format for example add_integer, sub_long etc.
3.7.2 Instruction format two
Opcode represents one byte instruction,
index one represents two byte index to constant
pool entry, instruction such as create_object use
this format, in this case index one must be index
to constant pool class entry.
3.7.3 Instruction format three
Opcode represents one byte instruction,
index one and index two are two byte indexes to
constant pool entry. This format is only used by
rebind instruction. The two indexes represents
index to method reference entry in constant pool.
The first one points to original method that want
to rebind, the second index represents method to
be rebind.
3.8 Data Types
u1 opcode
u1 opcode u2 Index one
u1 opcode u2 Index one u2 Index two
There are 5 data types available in UVM,
integer, long, float, double, and reference.
Boolean, char, byte, short are process as integer in
UVM. Type of integer, float and reference occupy
one word and type of double and long occupy two
word.
3.9 Local Variable Array
Variable declared in method definition
and parameter variable are local variable. Each
local variable has an index to local variable array.
Local variable are allocate upon method creation
and release after the method is returned. SOOL
compiler calculate index for each of the local
variable declared in the method and provide the
size of local variable array in the ucode file. Local
variable are also measured in word size. Local
variable of type boolean, char, byte, short, int,
float, reference occupy one word, long and double
occupy two word.
3.10 Operand Stack
UVM is a stack based machine, all of its
operation are performed in a stack, that stack is
called operand stack. For each method, SOOL
compiler determines how many word must be
allocate for the operand stack and that information
is supplied in the ucode file. Because SOOL
works in a stack based machine model, no
intermediate variable are needed. All intermediate
value are pushed onto the stack and used by
another operation.
3.11 Method Frame
For each method call operation, UVM
creates a method frame for the method, method
frame consist the following components, local
variable array, operand stack, constant pool of the
class of the method, method to be execute.
Method frame are pushed into the central runtime
method call stack on method call and pop after the
method is returned.
3.12 Method Call Stack
UVM is a single thread machine, it have
only one method call stack for execution of SOOL
program, every method call cause the creation of
method frame and pushed the created method
frame on to the central method call stack. This
central method call stack is used for method call
and return; method can be called recursively in
SOOL programming language.
3.13 Class loading and linking
Classes are loaded before they are
needed by the runtime system. Only one class
loader is available in SOOL. Classes are loaded
when they are needed by the runtime system and
link when necessary. Class loading involves
finding the ucode representation of the class file,
parse and construct various native representations
for the loaded class, allocate static data needed by
the class calculate field offset, construct virtual
table and rebindable table etc. Loading a class will
cause parent class or interface class to be loaded
into the runtime system. Classes are loaded only
one time in UVM. There is only one class loader
in UVM whereas JVM or CLR can have many
custom class loaders.
3.14 Virtual table
Each class maintains a virtual table to
implement virtual method call and interface
method call. Virtual table contains address to
methods representation of the class. Virtual table
of the class are constructed after the class is
loaded into the runtime system.
3.15 Rebindable table
Just like virtual table, but used
call_rebinable instruction. Rebindable table
maintains method address to rebindable method of
the class. Rebinable table are also help by object.
4. Design
All components of UVM are designed by
using Object-Oriented approach especially
applying Design Patterns when needed. The result
is more modular and compact virtual machine in
which components can be easily replace. The
overview of the UVM is shown in Fig 1.
4.1 Class Loader
Class loader loads a class into the
runtime system, when a class is loaded into the
system all of its ancestor class and interfaces are
also loaded into the runtime system. All loaded
classes are registered in the class manager. Class
manager stores all loaded class with hash table. So
the next time a class is needed class manager can
give the required native representation of the
class.
4.2 Memory manager
Memory manager manages heap
operation requested by other component of the
system. It also manages method call stack.
Various memory allocations are requested through
the memory manager.
4.2.1 Heap
Heap is a region of memory area for
storing various object created by the user
programs. Object and arrays are stored in the
heap.
4.2.2 Static Area
All user ucode file must be converted
into their corresponding native representation
before they can be executed on the UVM. Those
native representations are stored in the section of
memory area called static area. Bytecode, class
representation, constant pool, method are stored in
the static area.
4.2.3 Method Call Stack
All method call creates a stack frame that
is pushed into the central method call stack.
Memory manager manger method call stack,
when a method invocation is occurred a method
frame is allocated for the method, it consists of
allocation local variable array, operand stack and
other require data structure within the method
frame. The allocated method frame is pushed into
the method call stack, when a method is returned
execution engine request the memory manger to
pop a method frame on the top of the method call
stack.
4.3 Execution Engine
It is the heart of the UVM, it perform
actual interpretation of bytecode instruction.
Execution engine is implemented with Indirect-
Threaded interpreter [3]. Execution engine have
the following virtual register.
PC- program counter hold the address of next
instruction
TopOfStack- maintains the top of current stack
frame’s top of operand stack.
LocalVarArray – stores pointer to the local
variable array of the current stack frame.
ConstantPool – stores the constant pool of the
current class.
Method- holds native representation of current
executing method.
Class – holds native representation of current
class
5. Implementation
UVM is implemented in C++, C++ is
chosen for the implementation of the UVM
because of its speed, low level access to memory
operation, its Object-Oriented nature. The entire
virtual machine consist of only 5000 lines of code
that is very small in contract to typical JVM
implementation. The JamVM [9], smallest and
simplest open source JVM implementation consist
User.ucode
files
Class loader
Memory Manager
Execution
Engine
Host operating system
Heap
Static Area
Method Call Stack
Runtimes data area
Figure 1.Architecture overview of UVM
of over 20,000 line of C code ,while hot spot JVM
implementation take over 400,000 line of C++
code. UVM is so small because it is well
structured using O-O technique, other feature such
as garbage collection, bytecode verification are
omitted. All features for the correct execution of
SOOL programs are implemented.
5.1 Main Sub System
UVM is internally implemented as four
sub systems, class manager, memory manager,
native manager, Execution Engine. All these sub
system are designed as Façade pattern this allows
modular, flexible way to structure complex sub
components behind managers. All managers
interact with interface supplied by Façade of the
manager lowering interaction among sub
components of unrelated component.
5.1.1 Class Manager
Class manager handles all necessary
operation related to class; it manages class parser
and class loader and maintains the loaded class
with a hash table. Class parser reads and parses
the file given by the class manager and constructs
various runtimes representation of the
corresponding class. Class loader loads the parent
classes and interfaces for the loaded class unless
they are not already loaded into the system. Class
loader also construct virtual tables, rebindable
table, calculates offset of the field declared in the
class. Class loader also call static constructor of
the class if there exist one. Class manager is
designed as singleton and façade pattern [5].
5.1.2 Memory Manager
All memory management functions are
handled by the memory manager. Memory
manger handles heap allocation, stack allocation,
static data area allocation etc. Memory manager is
designed as façade pattern. By structuring
memory manager as façade pattern, it is easy to
replace the module easily. All memory
management facilities are hidden behind the
façade class. So we can change the internal detail
of method call stack, and memory allocation
strategy without affecting client.
5.1.3 Indirect-Threaded interpreter
Indirect Threaded interpreter improves
on switch based dispatch by eliminating central
dispatch. This work as follows. In the executable
code stream, each bytecode is replaced by the
address of its associated implementation. Also, at
the end of each bytecode implementation, the
code required to dispatch the next bytecode is
added. This is illustrated in the following figure.
5.2 Runtime data structure
5.2.1 Representing class, interface
Every class or interface is represented
with a single C++ class called UVMClass. The
following is the structure of UVMClass.
class UVMClass
{
…..
int sizeOfInstanceVar;
int sizeOfStaticVar;
int noOfConstantPoolEntry;
ConstantPoolEntry **constantPool;
char *className;
UVMClass *superClass;
UVMField **fields;
UVMMehtod **methods;
void *code[]={&&load_cpool,
&&load_local_int,&&load_local_long,…};
/* dispatch first instruction */
goto **(code[pc++]);
/*implementations*/
load_cpool:
*sp++ = constant;
goto **(code[pc++]);
load_local_int:
*sp++ = localvar[index];
goto **(code[pc++]);
Fig.2 Indirect Threaded Interpreter
UVMInterface** interfaces;
UVMMethod **vtable;
UVMMethod **rebinableTable;
};
UVMClass can store both a class and
interface. It consist information about class, class
name, super class of the class, methods of the
class, fields of class, virtual table for the class,
constant pool of the class etc.
5.2.2 Object Layout
UVMClass *ownerClass
Field 1
Field 2
…….
Rebindable table
The first word of object layout is the
class structure to its own class. The next section is
data section in which fields are placed one after
another. The last section holds the rebindable
table for rebind instruction.
5.2.3 Method call
Method call operations are handled by
the interpreter in the following ways.
• Resolve the method reference entry in
constant pool unless it is not already
resolved
• Allocate method frame for the method to
be invoked
• Pass parameter (copy value of operand
stack from current stack frame to local
variable array of the newly created
frame)
• Transfer control to the method.
The above procedure is general for method like
call_constructor or call_static, but call_interface
or call_virtual is a little complicated, calling a
virtual method works as follows.
• Determine the location of object
reference in operand stack and fetch the
object
• From the object, fetch the class of object
• Find the method to be invoke in the
vtable of the class
• Allocate frame for method to be invoked
• Pass parameter
• Transfer control to the method.
5.3 Native Manager
Native methods are method written in C+
+. They are used for IO facilities or operating
system service call, or library methods. Native
manager manages native method. SOOL
programs can mark any method with native
modifier and provide a native implementation of
the method in the virtual machine implementation.
Native manager helps native method call and
access to native libraries.
6. Conclusion
A virtual machine for a modern Object-
Oriented programming language is presented in
this paper. The main contribution of this work is
the design and implementation of UVM (Unified
Virtual Machine) for execution of SOOL
programs. UVM is structured using Object-
Oriented design, and modular and components are
easily replaceable due to its Object-Oriented
nature. Instruction set of UVM is designed to be
small as possible, it have 111 instructions that is
half of instruction set of modern runtimes such as
JVM. Because of the instruction set is small, it
implementation is also simple and small. But
instruction set of UVM are not designed for
optimization as they do in JVM or CLR. UVM
attempts to provide a complete runtime
environment with implementation as simple as
possible. Garbage collection is not currently
implemented in UVM.
7. References
[1] Ken Arnold, Jame Gosling and Daivd Holmes.
The Java Programming Language .Addison
Wesely August 2005
[2] Yannis Bres, Bernard Paul Serpette, Manuel
Serrano. Compiling Scheme programs to .NET
Common Intermediate Language, In Proceeding
of.NET Technologies'2004
[3] Anton M.Ertl .A portable Forth Engine . URL:
http://www.complang.tuwien.ac.at/forth/threaded-
code.html
[4]. Adele Goldberg and David Robson,Smalltak
Design and Implementation Addison Wesely
1983
[5]E. Gamma, R. Helm, R. Johnson, J. Vlissides,
Design Patterns - Elementsof Reusable Object-
Oriented Software, Addison-Wesley, 1994.
[6]. Jame Gosling Java Intermediate Bytecodes.
ACMSIGPLAN Workshop on Intermediate
representation, 1995
[7]. Anders Hejlsberg, Scott Wiltamuth, Peter
Golde The C# Programming Language Addison
Wesely ,October 2003.
[8]. Tim Lindholm and Frank Yelling Java Virtual
Machine Specification Addison-Wesely, Second
edition, 1999.
[9]JamVM [http://www.sourceforg.net] Robert
Lougher <rob@lougher.org.uk>
[10]. Martin Odersky .Scala By Example .Draft
May 2008,Programming Methods Laboratory
EPFL, Switzerland
[11]. Bjarne Stroustrup.The Design and Evolution
of C++. Addison Wesely 2004
[12]. Wayne Kelly and John Gough .Ruby.NET :
A Ruby Compiler for the Common Language
Infrastructure. In Thirty-First Australasian
Computer Science Conference (ACSC
2008),Wollongong,Australia
[13].Ecma-335. Common Language Infrastructure
4th Edition June 2006.
[1] Ken Arnold, Jame Gosling and Daivd Holmes.
The Java Programming Language .Addison
Wesely August 2005
[2] Yannis Bres, Bernard Paul Serpette, Manuel
Serrano. Compiling Scheme programs to .NET
Common Intermediate Language, In Proceeding
of.NET Technologies'2004
[3] Anton M.Ertl .A portable Forth Engine . URL:
http://www.complang.tuwien.ac.at/forth/threaded-
code.html
[4]. Adele Goldberg and David Robson,Smalltak
Design and Implementation Addison Wesely
1983
[5]E. Gamma, R. Helm, R. Johnson, J. Vlissides,
Design Patterns - Elementsof Reusable Object-
Oriented Software, Addison-Wesley, 1994.
[6]. Jame Gosling Java Intermediate Bytecodes.
ACMSIGPLAN Workshop on Intermediate
representation, 1995
[7]. Anders Hejlsberg, Scott Wiltamuth, Peter
Golde The C# Programming Language Addison
Wesely ,October 2003.
[8]. Tim Lindholm and Frank Yelling Java Virtual
Machine Specification Addison-Wesely, Second
edition, 1999.
[9]JamVM [http://www.sourceforg.net] Robert
Lougher <rob@lougher.org.uk>
[10]. Martin Odersky .Scala By Example .Draft
May 2008,Programming Methods Laboratory
EPFL, Switzerland
[11]. Bjarne Stroustrup.The Design and Evolution
of C++. Addison Wesely 2004
[12]. Wayne Kelly and John Gough .Ruby.NET :
A Ruby Compiler for the Common Language
Infrastructure. In Thirty-First Australasian
Computer Science Conference (ACSC
2008),Wollongong,Australia
[13].Ecma-335. Common Language Infrastructure
4th Edition June 2006.

VMPaper

  • 1.
    A Virtual Machinefor an Object-Oriented Programming Language Thet Khine, Khine Moe Nwe University of Computer Studies, Yangon mrthetkhine@gmail.com Abstract Modern programming languages such as Java [1] , C# [7] produce platform independent bytecode representation of their programs that can be run on top of a Virtual Machine. Virtual Machine executes the bytecode of their corresponding programming language. Virtual Machine implements the instruction set defined by their language. They also handle various runtime managements such as liking, loading, memory allocation and deallocation etc. SOOL (Simple Object-Oriented Language) is a simple, modern, Object-Oriented programming language with static and strong type system, modern language construct for creation of Design Pattern. The compiler for SOOL produces the bytecode that can be run on its own virtual machine UVM (Unified Virtual Machine). This paper presents the design, implementation method for the UVM used by the programming language SOOL. The design of UVM is mainly inspired by JVM [8].Current implementation of UVM can be run on Window platform. Keywords: programming language implementation, runtime environment, virtual machine, Object-Oriented programming. 1. Introduction Modern programming languages Java [1] and C# [7] does not directly produce native machine code like C++ [11] that can be directly executed on the hardware machine. Instead, they produce platform independent representation of their program that can be executed by means of an abstract machine or virtual machine. Virtual machine approach to language implementation offers many advantages over producing machine code, it is easier for the compiler writer to produce bytecode representation of the program than machine code, programs can be platform independent, multiple language interoperability etc. SOOL is a simple, modern, Object- Oriented programming language with modern construct for the creation of Design Pattern. Design Patterns are described in [5]. It offers full capabilities of O-O language such as class, inheritance, interface, virtual method, exception handling. Rebindable method, free class, adapter clause, automatic delegation, singleton class are introduced in the language. SOOL compiler produces the bytecode representation of its program. The bytecode is run on the UVM, a virtual machine designed to run SOOL programs. Virtual machine can be either stack based or register based. Stack based virtual machine operates all of its operation on the expression stack or operand stack, for example, adding two integer consists the following operations, pop two operand stack from the operand stack, add them, push the result back onto the operand stack. Java runs on JVM and C# runs on top of CLR, both of them are stack based virtual machine. UVM is also a stack based virtual machine; code generation process is simpler in stack based virtual machine than register based one. Virtual machine can execute their instruction in various ways. One the most three popular options are pure interpretation, AOT (Ahead of Time compilation), JIT (Just in Time compilation). Pure bytecode interpreter are easy to implements and portable but they have poor performance over AOT or JIT. Bytecode interpreter fetches an instruction, decodes it and execute the action for the instruction. AOT compiler compiles the bytecode into the machine executable code In JIT compiler, the bytecode is translated into native machine instruction on the fly when the program is running and the resulting native instructions are cached, after the bytecode is executed for the next time, the cached native instruction are executed A JIT compiler are the most popular approach and can give better performance over the pure interpretation but requires complex runtime implementation and method.
  • 2.
    The goal ofUVM is to provide a simple and elegant virtual machine that can execute SOOL programs with smallest possible code size and module that can easily be replaced. The solution must be simple and small so that it can be studied easily by undergraduate student level without requiring too much effort. It design encourages student to experiments in language implementation and virtual machine design more easily than other commercial or open source virtual machine. UVM used Indirect-Threaded Interpreter for its execution engine. UVM is written in C++. UVM contains 111 instruction set and 5000 lines of code for its implementation. Java programming language has 256 instructions set. 2. Related Work Smalltalk [4] is a simple pure Object- Oriented language with dynamic type system that can be run on Smalltalk virtual machine. Smalltalk virtual machine is presented in [4], it is a stack based machine. Smalltalk uses primitive methods that are similar to native method of SOOL programming language. Two major component of the Smalltalk virtual machine are bytecode interpreter and object memory. Bytecode interpreter execute the instruction of Smalltalk, the function of object memory is to create, store and destroy objects, and to provide access to their fields. JVM [8] have many characteristic in common with UVM, many of design issues are inspired by JVM. JVM is presented in [8]. All Java program must be verified to be correct before they can be executed, UVM does not need such facilities because it is intend to be used in academic domain. Microsoft .NET CLI is presented in ECMA 335 standard [13]. Unlike UVM and JVM, it support multiple programming language, all .NET programming language are compiled into MSIL (Microsoft Intermediate language). CLI is also a stack based machine. CTS (common type system) of CLI enable multi language interoperability. CLI implementations are employed with JIT compiler rather than interpreter due to their nature of instruction set. Current modern programming language [10] does not construct their only runtime system; instead they produce bytecode for JVM or CLR. There are many advantages of this approach. First there exist many libraries for JVM or CLR, those libraries can be integrated with new language, and language designer is not worried to implement the library or virtual machine. We do not choose this approach because our goal is to provide not only a programming language and implementation but also a educational framework that must be easy for study by undergraduate level. Modern virtual machines are sophisticated and complex and difficult to use as an educational tools by student level. Scala [10] compiler compiled the Scala language into Java bytecode. 3. Background Theory 3.1 The SOOL programming language The SOOL programming language is a simple Object-Oriented, statically and strongly typed, general purpose language enhanced with special language constructs for creation of Design Pattern. SOOL is designed to provide a complete modern O-O characteristic while keeping its implementation compact and small. SOOL compiler produces an ucode file (universal code) for each of the class in the compilation unit. Ucode file is a platform independent representation of the SOOL programs. SOOL programming language offers exception handling mechanism for modular error detection and control. Method of the same class can vary their implementation at runtime in SOOL with the help of rebindable method and free class, automatic delegation mechanism is provided that can be used to model Object Adapter, dynamic inheritance and other O-O common idioms. Dynamic linking model is employed in the language. 3.2 Unified Virtual Machine The Unified Virtual Machine is an abstract machine responsible for the execution of the SOOL programs. It design is inspired by modern language runtime, mainly form JVM [8]. UVM is a stack based machine that operates its operation on top of operand stack. The maximum size of operand stack required by a method is computed at compile time. UVM manage various runtime managements such as memory management, exception handling, and object creation. Unlike JVM and CLI, UVM has no multithreading capabilities, so there is only one
  • 3.
    method call stackfor execution of SOOL programs that makes it easier and simpler to implement the UVM. Bytecode verification [6] are not employed in the UVM, security constraints are not defined by the SOOL programming language so the UVM has no support for security mechanism at virtual machine level. 3.3 Ucode file Ucode files are produced by the SOOL compiler. They are very similar to the .class file of Java except the following difference .Java class file can contain custom attribute whereas ucode file does not, Java class file can contain debugging information but ucode file does not employ debugging information. The following give the structure of ucode file. Ucode file is stored in network byte order format. Size is measured in byte, for example u4 means unsigned 4 byte. u4 MAGIC_CODE u1 Version u2 ConstantPool_Count ConstantPoolEntry [ ConstantPool_Count] u2 classModifier u2 thisClassIndex u2 superClassIndex u2 interface_count u2 interfaces[interface_count] u2 field_count Field [field_count] u2 method_count Method [method_count] 3.3.1 Constant pool Constant pool store various constant used by the program, for example string constant, integer constant, symbolic reference to class, field and method. These symbolic references are resolved into native reference by the UVM at runtime. Because SOOL employs dynamic linking model references to external class, field, and method are stored with symbolic constant and they are resolved only when needed by the runtime system. Constant pool can be one of the following entries. 1. String constant entry 2. Integer constant entry 3. Long constant entry 4. Float constant entry 5. Double constant entry 6. Class entry 7. Method reference entry 8. Interface method reference entry 9. Field reference entry 3.3.2 Field Field are encode in the following format. u2 field modifier u2 field name index to constant pool u2 field type index to constant pool Field modifier is integer representing of various attribute of the field, for example private, static. Field name index is the index to constant pool pointing to string entry representing the name of the field. Field type index points to constant pool string entry representing the type of field. 3.3.3 Method Method are encode in the following format. u2 modifier u2 method name index u2 method signature index u2 size of argument u2 size of local variable u2 maximum size of operand stack u2 no of exception table u2 method code length u1 bytecode[ method code length ] Exception Table[ no of exception table] Method modifier is integer representing various attribute of the method such as private, public, protected, abstract, final, rebindable. Method name index points to constant pool string entry representing the name of the method. Method signature index points to constant pool string entry representing the signature of the method. Argument size is used in method call statements to determine how much parameter
  • 4.
    must be passedto the calling method. Size of local variable denotes the no of local variable need by the method measured in word size. Maximum size of operand stack defines the maximum depth of operand stack when execution of the method measured in word size. No of exception table is the no of catch statement in the method to handle exception. 3.3.4 Exception table u2 from code offset u2 to code offset u2 target offset of exception handler u2 index of constant pool to catch exception Exception table define catch statement of the program. From code offset is the starting offset of the try statement, to code offset is the ending offset of the try statement, target offset of exception handler gives offset of the catch statement code that handles the exception. The last index point to constant pool class entry representing the exception type the catch statement wants to handle. 3.5 Type descriptor Ucode file stores type name of the variable, field, class, method return type, method signature with textual representation. The following is used to represent type name. string m byte b short s int i long l float f double d boolean t char c Class name of the Class Arrays are denoted as follows. Array of integer are stored as [i, array of Human class are stored as [Human. Multidimensional arrays are stored as prefixing the bracket with same no as dimension of array. For example two dimensional array of Object will be stored as [[Object. 3.6 Instruction set UVM has 111 instruction set for load and store operation, arithmetic, logical, relation, jump and control transfer, object creation, method call, field access, array creation and array element access, exception handling, conversion, rebind statement, instance of test statement. Complete instruction set is not presented in this paper due to space limitation. 3.7 Instruction format There are three type of instruction format. They are followings. 3.7.1 Instruction format one Instruction format one only consist of one byte opcode, many instruction of UVM used this format for example add_integer, sub_long etc. 3.7.2 Instruction format two Opcode represents one byte instruction, index one represents two byte index to constant pool entry, instruction such as create_object use this format, in this case index one must be index to constant pool class entry. 3.7.3 Instruction format three Opcode represents one byte instruction, index one and index two are two byte indexes to constant pool entry. This format is only used by rebind instruction. The two indexes represents index to method reference entry in constant pool. The first one points to original method that want to rebind, the second index represents method to be rebind. 3.8 Data Types u1 opcode u1 opcode u2 Index one u1 opcode u2 Index one u2 Index two
  • 5.
    There are 5data types available in UVM, integer, long, float, double, and reference. Boolean, char, byte, short are process as integer in UVM. Type of integer, float and reference occupy one word and type of double and long occupy two word. 3.9 Local Variable Array Variable declared in method definition and parameter variable are local variable. Each local variable has an index to local variable array. Local variable are allocate upon method creation and release after the method is returned. SOOL compiler calculate index for each of the local variable declared in the method and provide the size of local variable array in the ucode file. Local variable are also measured in word size. Local variable of type boolean, char, byte, short, int, float, reference occupy one word, long and double occupy two word. 3.10 Operand Stack UVM is a stack based machine, all of its operation are performed in a stack, that stack is called operand stack. For each method, SOOL compiler determines how many word must be allocate for the operand stack and that information is supplied in the ucode file. Because SOOL works in a stack based machine model, no intermediate variable are needed. All intermediate value are pushed onto the stack and used by another operation. 3.11 Method Frame For each method call operation, UVM creates a method frame for the method, method frame consist the following components, local variable array, operand stack, constant pool of the class of the method, method to be execute. Method frame are pushed into the central runtime method call stack on method call and pop after the method is returned. 3.12 Method Call Stack UVM is a single thread machine, it have only one method call stack for execution of SOOL program, every method call cause the creation of method frame and pushed the created method frame on to the central method call stack. This central method call stack is used for method call and return; method can be called recursively in SOOL programming language. 3.13 Class loading and linking Classes are loaded before they are needed by the runtime system. Only one class loader is available in SOOL. Classes are loaded when they are needed by the runtime system and link when necessary. Class loading involves finding the ucode representation of the class file, parse and construct various native representations for the loaded class, allocate static data needed by the class calculate field offset, construct virtual table and rebindable table etc. Loading a class will cause parent class or interface class to be loaded into the runtime system. Classes are loaded only one time in UVM. There is only one class loader in UVM whereas JVM or CLR can have many custom class loaders. 3.14 Virtual table Each class maintains a virtual table to implement virtual method call and interface method call. Virtual table contains address to methods representation of the class. Virtual table of the class are constructed after the class is loaded into the runtime system. 3.15 Rebindable table Just like virtual table, but used call_rebinable instruction. Rebindable table maintains method address to rebindable method of the class. Rebinable table are also help by object. 4. Design All components of UVM are designed by using Object-Oriented approach especially applying Design Patterns when needed. The result is more modular and compact virtual machine in which components can be easily replace. The overview of the UVM is shown in Fig 1.
  • 6.
    4.1 Class Loader Classloader loads a class into the runtime system, when a class is loaded into the system all of its ancestor class and interfaces are also loaded into the runtime system. All loaded classes are registered in the class manager. Class manager stores all loaded class with hash table. So the next time a class is needed class manager can give the required native representation of the class. 4.2 Memory manager Memory manager manages heap operation requested by other component of the system. It also manages method call stack. Various memory allocations are requested through the memory manager. 4.2.1 Heap Heap is a region of memory area for storing various object created by the user programs. Object and arrays are stored in the heap. 4.2.2 Static Area All user ucode file must be converted into their corresponding native representation before they can be executed on the UVM. Those native representations are stored in the section of memory area called static area. Bytecode, class representation, constant pool, method are stored in the static area. 4.2.3 Method Call Stack All method call creates a stack frame that is pushed into the central method call stack. Memory manager manger method call stack, when a method invocation is occurred a method frame is allocated for the method, it consists of allocation local variable array, operand stack and other require data structure within the method frame. The allocated method frame is pushed into the method call stack, when a method is returned execution engine request the memory manger to pop a method frame on the top of the method call stack. 4.3 Execution Engine It is the heart of the UVM, it perform actual interpretation of bytecode instruction. Execution engine is implemented with Indirect- Threaded interpreter [3]. Execution engine have the following virtual register. PC- program counter hold the address of next instruction TopOfStack- maintains the top of current stack frame’s top of operand stack. LocalVarArray – stores pointer to the local variable array of the current stack frame. ConstantPool – stores the constant pool of the current class. Method- holds native representation of current executing method. Class – holds native representation of current class 5. Implementation UVM is implemented in C++, C++ is chosen for the implementation of the UVM because of its speed, low level access to memory operation, its Object-Oriented nature. The entire virtual machine consist of only 5000 lines of code that is very small in contract to typical JVM implementation. The JamVM [9], smallest and simplest open source JVM implementation consist User.ucode files Class loader Memory Manager Execution Engine Host operating system Heap Static Area Method Call Stack Runtimes data area Figure 1.Architecture overview of UVM
  • 7.
    of over 20,000line of C code ,while hot spot JVM implementation take over 400,000 line of C++ code. UVM is so small because it is well structured using O-O technique, other feature such as garbage collection, bytecode verification are omitted. All features for the correct execution of SOOL programs are implemented. 5.1 Main Sub System UVM is internally implemented as four sub systems, class manager, memory manager, native manager, Execution Engine. All these sub system are designed as Façade pattern this allows modular, flexible way to structure complex sub components behind managers. All managers interact with interface supplied by Façade of the manager lowering interaction among sub components of unrelated component. 5.1.1 Class Manager Class manager handles all necessary operation related to class; it manages class parser and class loader and maintains the loaded class with a hash table. Class parser reads and parses the file given by the class manager and constructs various runtimes representation of the corresponding class. Class loader loads the parent classes and interfaces for the loaded class unless they are not already loaded into the system. Class loader also construct virtual tables, rebindable table, calculates offset of the field declared in the class. Class loader also call static constructor of the class if there exist one. Class manager is designed as singleton and façade pattern [5]. 5.1.2 Memory Manager All memory management functions are handled by the memory manager. Memory manger handles heap allocation, stack allocation, static data area allocation etc. Memory manager is designed as façade pattern. By structuring memory manager as façade pattern, it is easy to replace the module easily. All memory management facilities are hidden behind the façade class. So we can change the internal detail of method call stack, and memory allocation strategy without affecting client. 5.1.3 Indirect-Threaded interpreter Indirect Threaded interpreter improves on switch based dispatch by eliminating central dispatch. This work as follows. In the executable code stream, each bytecode is replaced by the address of its associated implementation. Also, at the end of each bytecode implementation, the code required to dispatch the next bytecode is added. This is illustrated in the following figure. 5.2 Runtime data structure 5.2.1 Representing class, interface Every class or interface is represented with a single C++ class called UVMClass. The following is the structure of UVMClass. class UVMClass { ….. int sizeOfInstanceVar; int sizeOfStaticVar; int noOfConstantPoolEntry; ConstantPoolEntry **constantPool; char *className; UVMClass *superClass; UVMField **fields; UVMMehtod **methods; void *code[]={&&load_cpool, &&load_local_int,&&load_local_long,…}; /* dispatch first instruction */ goto **(code[pc++]); /*implementations*/ load_cpool: *sp++ = constant; goto **(code[pc++]); load_local_int: *sp++ = localvar[index]; goto **(code[pc++]); Fig.2 Indirect Threaded Interpreter
  • 8.
    UVMInterface** interfaces; UVMMethod **vtable; UVMMethod**rebinableTable; }; UVMClass can store both a class and interface. It consist information about class, class name, super class of the class, methods of the class, fields of class, virtual table for the class, constant pool of the class etc. 5.2.2 Object Layout UVMClass *ownerClass Field 1 Field 2 ……. Rebindable table The first word of object layout is the class structure to its own class. The next section is data section in which fields are placed one after another. The last section holds the rebindable table for rebind instruction. 5.2.3 Method call Method call operations are handled by the interpreter in the following ways. • Resolve the method reference entry in constant pool unless it is not already resolved • Allocate method frame for the method to be invoked • Pass parameter (copy value of operand stack from current stack frame to local variable array of the newly created frame) • Transfer control to the method. The above procedure is general for method like call_constructor or call_static, but call_interface or call_virtual is a little complicated, calling a virtual method works as follows. • Determine the location of object reference in operand stack and fetch the object • From the object, fetch the class of object • Find the method to be invoke in the vtable of the class • Allocate frame for method to be invoked • Pass parameter • Transfer control to the method. 5.3 Native Manager Native methods are method written in C+ +. They are used for IO facilities or operating system service call, or library methods. Native manager manages native method. SOOL programs can mark any method with native modifier and provide a native implementation of the method in the virtual machine implementation. Native manager helps native method call and access to native libraries. 6. Conclusion A virtual machine for a modern Object- Oriented programming language is presented in this paper. The main contribution of this work is the design and implementation of UVM (Unified Virtual Machine) for execution of SOOL programs. UVM is structured using Object- Oriented design, and modular and components are easily replaceable due to its Object-Oriented nature. Instruction set of UVM is designed to be small as possible, it have 111 instructions that is half of instruction set of modern runtimes such as JVM. Because of the instruction set is small, it implementation is also simple and small. But instruction set of UVM are not designed for optimization as they do in JVM or CLR. UVM attempts to provide a complete runtime environment with implementation as simple as possible. Garbage collection is not currently implemented in UVM. 7. References
  • 9.
    [1] Ken Arnold,Jame Gosling and Daivd Holmes. The Java Programming Language .Addison Wesely August 2005 [2] Yannis Bres, Bernard Paul Serpette, Manuel Serrano. Compiling Scheme programs to .NET Common Intermediate Language, In Proceeding of.NET Technologies'2004 [3] Anton M.Ertl .A portable Forth Engine . URL: http://www.complang.tuwien.ac.at/forth/threaded- code.html [4]. Adele Goldberg and David Robson,Smalltak Design and Implementation Addison Wesely 1983 [5]E. Gamma, R. Helm, R. Johnson, J. Vlissides, Design Patterns - Elementsof Reusable Object- Oriented Software, Addison-Wesley, 1994. [6]. Jame Gosling Java Intermediate Bytecodes. ACMSIGPLAN Workshop on Intermediate representation, 1995 [7]. Anders Hejlsberg, Scott Wiltamuth, Peter Golde The C# Programming Language Addison Wesely ,October 2003. [8]. Tim Lindholm and Frank Yelling Java Virtual Machine Specification Addison-Wesely, Second edition, 1999. [9]JamVM [http://www.sourceforg.net] Robert Lougher <rob@lougher.org.uk> [10]. Martin Odersky .Scala By Example .Draft May 2008,Programming Methods Laboratory EPFL, Switzerland [11]. Bjarne Stroustrup.The Design and Evolution of C++. Addison Wesely 2004 [12]. Wayne Kelly and John Gough .Ruby.NET : A Ruby Compiler for the Common Language Infrastructure. In Thirty-First Australasian Computer Science Conference (ACSC 2008),Wollongong,Australia [13].Ecma-335. Common Language Infrastructure 4th Edition June 2006.
  • 10.
    [1] Ken Arnold,Jame Gosling and Daivd Holmes. The Java Programming Language .Addison Wesely August 2005 [2] Yannis Bres, Bernard Paul Serpette, Manuel Serrano. Compiling Scheme programs to .NET Common Intermediate Language, In Proceeding of.NET Technologies'2004 [3] Anton M.Ertl .A portable Forth Engine . URL: http://www.complang.tuwien.ac.at/forth/threaded- code.html [4]. Adele Goldberg and David Robson,Smalltak Design and Implementation Addison Wesely 1983 [5]E. Gamma, R. Helm, R. Johnson, J. Vlissides, Design Patterns - Elementsof Reusable Object- Oriented Software, Addison-Wesley, 1994. [6]. Jame Gosling Java Intermediate Bytecodes. ACMSIGPLAN Workshop on Intermediate representation, 1995 [7]. Anders Hejlsberg, Scott Wiltamuth, Peter Golde The C# Programming Language Addison Wesely ,October 2003. [8]. Tim Lindholm and Frank Yelling Java Virtual Machine Specification Addison-Wesely, Second edition, 1999. [9]JamVM [http://www.sourceforg.net] Robert Lougher <rob@lougher.org.uk> [10]. Martin Odersky .Scala By Example .Draft May 2008,Programming Methods Laboratory EPFL, Switzerland [11]. Bjarne Stroustrup.The Design and Evolution of C++. Addison Wesely 2004 [12]. Wayne Kelly and John Gough .Ruby.NET : A Ruby Compiler for the Common Language Infrastructure. In Thirty-First Australasian Computer Science Conference (ACSC 2008),Wollongong,Australia [13].Ecma-335. Common Language Infrastructure 4th Edition June 2006.