Write Your Own JVM Compiler


Published on

Published in: Technology

Write Your Own JVM Compiler

  1. 1. Write your own JVM compiler OSCON Java 2011 Ian Dees @undeesHi, I’m Ian. I’m here to show that there’s never been a better time than now to write your owncompiler.
  2. 2. the three paths Abstraction compilers TimeWhy would someone want to do this? That depends on the direction you’re coming from.
  3. 3. Abstraction compilers xor eax, eax e- TimeIf, like me, you came from a hardware background, compilers are the next logical step up the ladderof abstraction from programming.
  4. 4. logic λx.x grammars Abstraction compilers xor eax, eax e- TimeIf you come from a computer science background, compilers and parsers are practical tools to getthings done (such as reading crufty ad hoc data formats at work).
  5. 5. logic λx.x grammars Abstraction compilers xor eax, eax e- TimeIf you’re following the self-made path, compilers are one of many computer science topics you maycome to naturally on your travels.
  6. 6. where we’re headedBy the end of this talk, we’ll have seen the basics of how to use JRuby to create a compiler for afictional programming language.
  7. 7. Thnad a fictional programming languageSince all the good letters for programming languages are taken (C, D, K, R, etc.), let’s use a fictionalletter. “Thnad,” a letter that comes to us from Dr. Seuss, will do nicely.
  8. 8. function factorial(n) { if(eq(n, 1)) { 1 } else { times(n, factorial(minus(n, 1))) } } print(factorial(4))Here’s a simple Thnad program. As you can see, it has only integers, functions, and conditionals.No variables, type definitions, or anything else. But this bare minimum will keep us plenty busy forthe next half hour.
  9. 9. hand-waving before we see the codeBefore we jump into the code, let’s look at the general view of where we’re headed.
  10. 10. minus(n, 1) { :funcall => { :name => minus }, { :args => [ { :arg => { :name => n } }, { :arg => { :number => 1 } } } ] } :funcall :name :args ‘minus’ :arg :arg :name :number ‘n’ ’1’We need to take a piece of program text like “minus(n, 1)” and break it down into something like asentence’s parts of speech: this is a function call, this is a parameter, and so on. The code in themiddle is how we might represent this breakdown in Ruby, but really we should be thinking about itgraphically, like the tree at the bottom.
  11. 11. minus(n, 1) Funcall.new minus, Usage.new(n), Number.new(1) Funcall @name @args ‘minus’ Usage Number @name @value ‘n’ 1The representation on the previous slide used plain arrays, strings, and so on. It will be helpful totransform these into custom Ruby objects that know how to emit JVM bytecode. Here’s the kind ofthing we’ll be aiming for. Notice that the tree looks really similar to the one we just saw—the onlydifferences are the names in the nodes.
  12. 12. 1. parse 2. transform 3. emitEach stage of our compiler will transform one of our program representations—original programtext, generic Ruby objects, custom Ruby objects, JVM bytecode—into the next.
  13. 13. 1. parseFirst, let’s look at parsing the original program text into generic Ruby objects.
  14. 14. describe Thnad::Parser do before do @parser = Thnad::Parser.new end it reads a number do expected = { :number => 42 } @parser.number.parse(42).must_equal expected end endHere’s a unit test written with minitest, a unit testing framework that ships with Ruby 1.9 (andtherefore JRuby). We want the text “42” to be parsed into the Ruby hash shown here.
  15. 15. tool #1: Parslet http://kschiess.github.com/parsletHow are we going to parse our programming language into that internal representation? By usingParslet, a parsing tool written in Ruby. For the curious, Parslet is a PEG (Parsing ExpressionGrammar) parser.
  16. 16. require parsletmodule Thnad class Parser < Parslet::Parser # gotta see it all rule(:space) { match(s).repeat(1) } rule(:space?) { space.maybe } # numeric values rule(:number) { match([0-9]).repeat(1).as(:number) >> space? } endendHere’s a set of Parslet rules that will get our unit test to pass. Notice that the rules read a little bitlike a mathematical notation: “a number is a sequence of one or more digits, possibly followed bytrailing whitespace.”
  17. 17. 2. transformThe next step is to transform the Ruby arrays and hashes into more sophisticated objects that will(eventually) be able to generate bytecode.
  18. 18. input = { :number => 42 } expected = Thnad::Number.new(42) @transform.apply(input).must_equal expectedHere’s a unit test for that behavior. We start with the result of the previous step (a Ruby hash) andexpect a custom Number object.
  19. 19. tool #2: Parslet (again)How are we going to transform the data? We’ll use Parslet again.
  20. 20. module Thnad class Transform < Parslet::Transform rule(:number => simple(:value)) { Number.new(value.to_i) } end endThis rule takes any simple Ruby hash with a :number key and transforms it into a Number object.
  21. 21. require parslet module Thnad class Number < Struct.new(:value) # more to come... end endOf course, we’ll need to define the Number class. Once we’ve done so, the unit tests will pass.
  22. 22. 3. emitFinally, we need to emit JVM bytecode.
  23. 23. bytecode ldc 42Here’s the bytecode we want to generate when our compiler encounters a Number object with 42 asthe value. It just pushes the value right onto the stack.
  24. 24. describe Emit do before do @builder = mock @context = Hash.new end it emits a number do input = Thnad::Number.new 42 @builder.expects(:ldc).with(42) input.eval @context, @builder end endAnd here’s the unit test that specifies this behavior. We’re using mock objects to say in effect,“Imagine a Ruby object that can generate bytecode; our compiler should call it like this.”
  25. 25. require parslet module Thnad class Number < Struct.new(:value) def eval(context, builder) builder.ldc value end end endOur Number class will now need an “eval” method that takes a context (useful for carrying aroundinformation such as parameter names) and a bytecode builder (which we’ll get to on the next slide).
  26. 26. tool #3: BiteScript https://github.com/headius/bitescriptA moment ago, we saw that we’ll need a Ruby object that knows how to emit JVM bytecode. TheBiteScript library for JRuby will give us just such an object.
  27. 27. iload 0 ldc 1 invokestatic example, minus, int, int, int ireturnThis, for example, is a chunk of BiteScript that writes a Java .class file containing a function call—inthis case, the equivalent of the “minus(n, 1)” Thnad code we saw earlier.
  28. 28. $ javap -c example Compiled from "example.bs" public class example extends java.lang.Object{ public static int doSomething(int); Code: 0: iload_0 1: iconst_1 2: invokestatic #11; //Method minus:(II)I 5: ireturnIf you run that through the BiteScript compiler and then use the standard Java tools to view theresulting bytecode, you can see it’s essentially the same program.
  29. 29. enough hand-waving let’s see the code!Armed with this background information, we can now jump into the code.
  30. 30. a copy of our home game https://github.com/undees/thnadAt this point in the talk, I fired up TextMate and navigated through the project, choosing thedirection based on whim and audience questions. You can follow along at home by looking at thevarious commits made to this project on GitHub.
  31. 31. more on BiteScript http://www.slideshare.net/CharlesNutter/ redev-2010-jvm-bytecode-for-dummiesThere are two resources I found helpful during this project. The first was Charles Nutter’s primer onJava bytecode, which was the only JVM reference material necessary to write this compiler.
  32. 32. see also http://createyourproglang.com Marc-André CournoyerThe second helpful resource was Marc-André Cournoyer’s e-book How to Create Your OwnFreaking Awesome Programming Language. It’s geared more toward interpreters than compilers,but was a great way to check my progress after I’d completed the major stages of this project.
  33. 33. hopesMy hope for us as a group today is that, after our time together, you feel it might be worth givingone of these tools a shot—either for a fun side project, or to solve some protocol parsing need youmay encounter one day in your work.