Metascala
A tiny DIY JVM
https://github.com/lihaoyi/Metascala
Li Haoyi
haoyi@dropbox.com
Scala Exchange
2nd Dec 2013
Who am I?
Li Haoyi
Write Python during the day
Write Scala at night
What is Metascala?
● A JVM
● in 3000 lines of Scala
● Which can load & interpret java programs
● And can interpret itself!
Size comparison
● Metascala:

~3,000 lines

● Avian JVM: ~80,000 lines
● OpenJDK:

~1,000,000 lines
Basic Usage

Create a new metascala VM
Plain Old Java Object

Captured variables are serialized
into VM’s environment

Clo...
It’s Metacircular!

Need to give the outer VM more
than the 1mb default heap

VM inside
a VM!
Simpler program avoids
initi...
Limitations
● Single-threaded
● Limited IO
● Slowww
Performance Comparison
● OpenJDK:

1.0x

● Metascala:

~100x

● Meta-Metascala: ~10000x
Why Metascala?
● Fun to explore the innards of the JVM
● An almost-fully secure Java runtime!
● Small size makes fiddling ...
Why Metascala?
● Fun to explore the innards of the JVM
● An almost-fully secure Java runtime!
● Small size makes fiddling ...
Quick Tour

Immutable Defs: ~380 loc
Native Bindings: ~650 loc
Bytecode SSA transform: ~650 loc
Runtime data structures:
8...
Quick Tour: Tests
Tests for basic Java features

GC Fuzz-tests
Test Metacircularity!
Scala std lib usage
What’s a Heap?

Fig 1. A Heap
What’s a Garbage Collector?
Blit (copy) all roots to new heap
Stop when you’ve
scanned everything

Not pseudocode

Scan th...
Why Metascala?
● Fun to explore the innards of the JVM
● An almost-fully secure Java runtime!
● Small size makes fiddling ...
Limited Instruction Count
And Limited Memory!

Not an OOM Error!
We throw this ourselves
Explicitly defined capabilities

Every external call has to
be explicitly defined and
enabled
Security Characteristics
●
●
●
●

Finite instruction count
Finite memory
Well-defined interface to outside world
Doesn’t r...
Security Holes
●
●
●
●
●

Classloader can read from anywhere
Time spent classloading not accounted
Memory spent on classes...
Basic Problem
Outside World

User code resource
consumption is bounded
VM’s runtime resource
usage can be made to
grow arb...
Possible Solution
Put a VM Inside a VM!
Works,
... but 10000x slowdown

Outside World
Fixed Unaccounted Costs
Outside Worl...
Another Possible Solution
Move more components into
virtual runtime
Difficult to bootstrap correctly
WIP

Outside World
Na...
Why Metascala?
● Fun to explore the innards of the JVM
● An almost-fully secure Java runtime!
● Small size makes fiddling ...
Live Demo
Ugliness
● Long compile times
● Nasty JVM Interface
● Impossible Debugging
Long compile times
●
● 100 lines/s
● Twice as slow (50 lines/s) on my older
machine!
Nasty JVM Interface
Ideal World
Initialized
User Code

Real World
Linear Initialization

Std Library

Initialized
User Cod...
Java’s dirty little secret
The Verbosity of Java with the Safety of C

WTF! I’d never use these things!
You probably do What happens if you don’t have
them

Almost every Java program
ever uses these things.
Next Steps
● Maximize correctness
○ Implement Threads & IO
○ Fix bugs (GC, native calls, etc.)

● Solidify security charac...
Possible Experiments
● Native codegen instead of an interpreter?
○ Generate/exec native code through JNI
○ Heap is already...
Metascala: a tiny DIY JVM
Ask me about:
● Single Static Assignment form
● Copying Garbage Collection
● sun.misc.Unsafe
● W...
Upcoming SlideShare
Loading in...5
×

Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

1,179

Published on

Metascala is a tiny metacircular Java Virtual Machine (JVM) written in the Scala programming language. Metascala is barely 3000 lines of Scala, and is complete enough that it is able to interpret itself metacircularly. Being written in Scala and compiled to Java bytecode, the Metascala JVM requires a host JVM in order to run.

The goal of Metascala is to create a platform to experiment with the JVM: a 3000 line JVM written in Scala is probably much more approachable than the 1,000,000 lines of C/C++ which make up HotSpot, the standard implementation, and more amenable to implementing fun features like continuations, isolates or value classes. The 3000 lines of code gives you:

The bytecode interpreter, together with all the run-time data structures
A stack-machine to SSA register-machine bytecode translator
A custom heap, complete with a stop-the-world, copying garbage collector
Implementations of parts of the JVM's native interface
Although it is far from a complete implementation, Metascala already provides the ability to run untrusted bytecode securely (albeit slowly), since every operation which could potentially cause harm (including memory allocations and CPU usage) is virtualized and can be controlled. Ongoing work includes tightening of the security guarantees, improving compatibility and increasing performance.

ENJOYIN

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,179
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
3
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Scala e xchange 2013 haoyi li on metascala a tiny diy jvm

  1. 1. Metascala A tiny DIY JVM https://github.com/lihaoyi/Metascala Li Haoyi haoyi@dropbox.com Scala Exchange 2nd Dec 2013
  2. 2. Who am I? Li Haoyi Write Python during the day Write Scala at night
  3. 3. What is Metascala? ● A JVM ● in 3000 lines of Scala ● Which can load & interpret java programs ● And can interpret itself!
  4. 4. Size comparison ● Metascala: ~3,000 lines ● Avian JVM: ~80,000 lines ● OpenJDK: ~1,000,000 lines
  5. 5. Basic Usage Create a new metascala VM Plain Old Java Object Captured variables are serialized into VM’s environment Closure’s class file is given to VM to load/parse/execute Result is extracted from VM into host environment No global state Any other classes necessary to evaluate the closure are loaded from the current Classpath
  6. 6. It’s Metacircular! Need to give the outer VM more than the 1mb default heap VM inside a VM! Simpler program avoids initializing the scala/java std libraries, which takes forever under double-interpretation. Takes a while (~10s) to produce result
  7. 7. Limitations ● Single-threaded ● Limited IO ● Slowww
  8. 8. Performance Comparison ● OpenJDK: 1.0x ● Metascala: ~100x ● Meta-Metascala: ~10000x
  9. 9. Why Metascala? ● Fun to explore the innards of the JVM ● An almost-fully secure Java runtime! ● Small size makes fiddling fun
  10. 10. Why Metascala? ● Fun to explore the innards of the JVM ● An almost-fully secure Java runtime! ● Small size makes fiddling fun
  11. 11. Quick Tour Immutable Defs: ~380 loc Native Bindings: ~650 loc Bytecode SSA transform: ~650 loc Runtime data structures: 820 loc Binary heap & Copying GC: 132 loc DIY scala-pickling: 132 loc “This is a VM”: 243 loc
  12. 12. Quick Tour: Tests Tests for basic Java features GC Fuzz-tests Test Metacircularity! Scala std lib usage
  13. 13. What’s a Heap? Fig 1. A Heap
  14. 14. What’s a Garbage Collector? Blit (copy) all roots to new heap Stop when you’ve scanned everything Not pseudocode Scan the already copied things for more things and copy them too
  15. 15. Why Metascala? ● Fun to explore the innards of the JVM ● An almost-fully secure Java runtime! ● Small size makes fiddling fun
  16. 16. Limited Instruction Count
  17. 17. And Limited Memory! Not an OOM Error! We throw this ourselves
  18. 18. Explicitly defined capabilities Every external call has to be explicitly defined and enabled
  19. 19. Security Characteristics ● ● ● ● Finite instruction count Finite memory Well-defined interface to outside world Doesn’t rely on Java Security Model at all! ● Still some holes…
  20. 20. Security Holes ● ● ● ● ● Classloader can read from anywhere Time spent classloading not accounted Memory spent on classes not accounted GC time not accounted “native” methods’ time/memory not accounted
  21. 21. Basic Problem Outside World User code resource consumption is bounded VM’s runtime resource usage can be made to grow arbitrarily large Classes Native method calls User Code Runtime Data Structures Garbage Collector
  22. 22. Possible Solution Put a VM Inside a VM! Works, ... but 10000x slowdown Outside World Fixed Unaccounted Costs Outside World Classes Native method calls User Code Runtime Data Structures Garbage Collector
  23. 23. Another Possible Solution Move more components into virtual runtime Difficult to bootstrap correctly WIP Outside World Native method calls Classes Garbage Collector User Code Runtime Data Structures
  24. 24. Why Metascala? ● Fun to explore the innards of the JVM ● An almost-fully secure Java runtime! ● Small size makes fiddling fun
  25. 25. Live Demo
  26. 26. Ugliness ● Long compile times ● Nasty JVM Interface ● Impossible Debugging
  27. 27. Long compile times ● ● 100 lines/s ● Twice as slow (50 lines/s) on my older machine!
  28. 28. Nasty JVM Interface Ideal World Initialized User Code Real World Linear Initialization Std Library Initialized User Code Std Library Clean Interfaces VM VM LazyInitialization means repeated dives back into lib/native code Nasty Language VM Interface
  29. 29. Java’s dirty little secret The Verbosity of Java with the Safety of C WTF! I’d never use these things!
  30. 30. You probably do What happens if you don’t have them Almost every Java program ever uses these things.
  31. 31. Next Steps ● Maximize correctness ○ Implement Threads & IO ○ Fix bugs (GC, native calls, etc.) ● Solidify security characteristics ○ Still plenty of unaccounted-for memory/processing ○ Some can be hosted “within” VM itself ● Simplify Std-Lib/VM interface ○ Try using Android Std Lib?
  32. 32. Possible Experiments ● Native codegen instead of an interpreter? ○ Generate/exec native code through JNI ○ Heap is already a binary blob that can be easily passed to native code ● Bytecode transforms and optimizations? ○ Already in SSA form ● Continuations, Isolates, Value Classes? ● Port the whole thing to Scala.Js?
  33. 33. Metascala: a tiny DIY JVM Ask me about: ● Single Static Assignment form ● Copying Garbage Collection ● sun.misc.Unsafe ● Warts of the .class file format ● Capabilities-based security ● Abandoned approaches
  1. Gostou de algum slide específico?

    Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.

×