Thnad's Revenge

  • 435 views
Uploaded on

At a previous JRubyConf, we talked about Thnad, a fictional programming language. Thnad served as a vehicle to explore the joy of building a compiler using JRuby, BiteScript, Parslet, and other tools. …

At a previous JRubyConf, we talked about Thnad, a fictional programming language. Thnad served as a vehicle to explore the joy of building a compiler using JRuby, BiteScript, Parslet, and other tools. Now, Thnad is back with a second runtime: Rubinius. Come see the Rubinius environment through JRuby eyes. Together, we'll see how to grapple with multiple instruction sets and juggle contexts without going cross-eyed.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
435
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
1
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Welcome to Thnad's Revenge, a prgramming language implementation tale in three acts.\n
  • (with apologies to Ira Glass) Act I, Meet Thnad, in which we encounter Thnad, a programming language built with JRuby and designed not for programmer happiness, but for implementer happiness. Act II, Enter the Challenger: Rubinius, in which we meet a new Ruby runtime. Act III, Thnad's Revenge, in which we port Thnad to run on the Rubinius runtime and encounter some surprises along the way.\n
  • Thnad is a programming language I created last summer as an excuse to learn some fun JRuby tools and see what it's like to write a compiler.\n
  • \n
  • \n
  • \n
  • We're going to look at a couple of those tools today. Starting at the low level of generating code, we have the Bitescript library, a DSL for generating Java bytecode.\n
  • \n
  • Here's an example, just to get an idea of the flavor. To call a method, you just push the arguments onto the stack and then call a specific opcode, in this case invokevirtual. The VM you're writing for is aware of classes, interfaces, and so on—you don't have to implement method lookup like you would on a typical physical CPU.\n
  • When I first saw the library, I thought it was something you'd only need if you were doing deep JVM hacking. But when I read the slides from Charlie's presentation at Øredev, it clicked. This library takes me way back to my college days, when we'd write assembler programs for a really simple instruction set like MIPS. Bitescript evokes that same kind of feeling. I'd always thought the JVM would have a huge, crufty instruction set—but it's actually quite manageable to keep the most important parts of it in your head.\n
  • That covers generating the final stage of compliation. But what about parsing the input? For that , I used a Ruby library called Parslet.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Parslet is a little different: it basically does the tokenizing and parsing together.\n
  • \n
  • Those two tools are all we need to build a simple programming language. I decided to call mine Thnad, which is named after a fictional letter in a Dr. Seuss book about extending the alphabet.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • So that's a whirlwind tour of Thnad. I was telling someone about this project—it was either Shane Becker or Brian Ford, I think—and he said, "Hey, you should port this to Rubinius!" I thought, "Hey, why not? Sounds fun." Before I could do this, I needed to learn a little more about the runtime.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Transcript

  • 1. Welcome to “Thnad’s Revenge,” a programming language implementation tale in three acts.
  • 2. Not to be confused with...
  • 3. http://en.wikipedia.org/wiki/Yars_Revenge...Yars’ Revenge, the awesome Atari video game from the ’80s.
  • 4. Cucumber Recipes Ian Dees with Aslak Hellesøy and Matt Wynne pragprog/titles/JRUBY discount code: JRubyIanDeesBefore we get to the talk, let me make a couple of quick announcements. First, we’reupdating the JRuby book this summer with a JRuby 1.7-ready PDF. To celebrate that, we’reoffering a discount code on the book during the conference. Second, I’m working on a newbook with the Cucumber folks, which has some JRuby/JVM stuff in it—if you’d like to be atech reviewer, please find me after this talk.
  • 5. I. Meet Thnad II. Enter the Frenemy III. Thnad’s Revenge(with apologies to Ira Glass) Act I, Meet Thnad, in which we encounter Thnad, a programminglanguage built with JRuby and designed not for programmer happiness, but for implementerhappiness. Act II, Enter the Frenemy, in which we meet a new Ruby runtime. Act III, ThnadsRevenge, in which we port Thnad to run on the Rubinius runtime and encounter somesurprises along the way.
  • 6. I. Meet ThnadThnad is a programming language I created last summer as an excuse to learn some funJRuby tools and see what its like to write a compiler.
  • 7. The name comes from a letter invented by Dr. Seuss in his book, “On Beyond Zebra.” Sincemost of the real letters are already taken by programming languages, a fictional one seemsappropriate.
  • 8. A Fictional Programming Language Optimized for Implementer HappinessJust as Ruby is optimized for programmer happiness, Thnad is optimized for implementerhappiness. It was designed to be implemented with a minimum of time and effort, and amaximum amount of fun.
  • 9. function factorial(n) { if (eq(n, 1)) { 1 } else { times(n, factorial(minus(n, 1))) } } print(factorial(4))Here’s a sample Thnad program demonstrating all the major features. Thnad has integers,functions, conditionals, and... not much else. These minimal features were easy to add,thanks to the great tools available in the JRuby ecosystem (and other ecosystems, as we’llsee).
  • 10. Thnad Features 1. Names and Numbers 2. Function Calls 3. Conditionals 4. Function DefinitionsIn the next few minutes, we’re going to trace through each of these four language features,from parsing the source all the way to generating the final binary. We won’t show everysingle grammar rule, but we will hit the high points.
  • 11. As Tom mentioned in his talk, there are a number of phases a piece of source code goesthrough during compilation.
  • 12. Stages of Parsing tokenize parse transform emitThese break down into four main stages in a typical language: finding the tokens or parts ofspeech of the text, parsing the tokens into an in-memory tree, transforming the tree, andgenerating the bytecode. We’re going to look at each of Thnad’s major features in thecontext of these stages.
  • 13. 1. Names and NumbersFirst, let’s look at the easiest language feature: numbers and function parameters.
  • 14. {:number => 42} root 42 :number "42"Our parser needs to transform this input text into some kind of Ruby data structure.
  • 15. Parslet kschiess.github.com/parsletI used a library called Parslet for that. Parslet handles the first two stages of compilation(tokenizing and parsing) using a Parsing Expression Grammar, or PEG. PEGs are like regularexpressions attached to blocks of code. They sound like a hack, but there’s solid compilertheory behind them.
  • 16. {:number => 42} root 42 :number "42" rule(:number) { match([0-9]).repeat(1).as(:number) >> space? }The rule at the bottom of the page is Parslet’s notation for matching one or more numbersfollowed by a optional space.
  • 17. {:number => 42} Thnad::Number.new(42) root root Thnad::Number :number :value "42" 42 rule(:number => simple(:value)) { Number.new(value.to_i) }Now for the third stage, transformation. We could generate the bytecode straight from theoriginal tree, using a bunch of hard-to-test case statements. But it would be nicer to have aspecific Ruby class for each Thnad language feature. The rule at the bottom of this slide tellsParslet to transform a Hash with a key called :number into an instance of a Number class weprovide.
  • 18. BiteScript github/headius/bitescriptThe final stage, outputting bytecode, is handled by the BiteScript library, which is basically adomain-specific language for emitting JVM opcodes.
  • 19. main do ldc 42 ldc 1 invokestatic :Example, :baz, [int, int, int] returnvoid endHeres an example, just to get an idea of the flavor. To call a method, you just push thearguments onto the stack and then call a specific opcode, in this case invokestatic. The VMyoure writing for is aware of classes, interfaces, and so on—you dont have to implementmethod lookup like you would with plain machine code.
  • 20. “JVM Bytecode for Dummies” Charles Nutter, Øredev 2010 slideshare/CharlesNutter/redev-2010-jvm-bytecode-for-dummiesWhen I first saw the BiteScript, I thought it was something youd only need if you were doingdeep JVM hacking. But when I read the slides from Charlies presentation at Øredev, itclicked. This library takes me way back to my college days, when wed write assemblerprograms for a really simple instruction set like MIPS. BiteScript evokes that same kind offeeling. Id always thought the JVM would have a huge, crufty instruction set—but its actuallyquite manageable to keep the most important parts of it in your head.
  • 21. class Number < Struct.new :value def eval(context, builder) builder.ldc value end endWe can generate the bytecode any way we want. One simple way is to give each of ourclasses an eval() method that takes a BiteScript generator and calls various methods on it togenerate JVM instructions.
  • 22. class Name < Struct.new :name def eval(context, builder) param_names = context[:params] || [] position = param_names.index(name) raise "Unknown parameter #{name}" unless position builder.iload position end endDealing with passed-in parameters is nearly as easy as dealing with raw integers; we justlook up the parameter name by position, and then push the nth parameter onto the stack.
  • 23. 2. Function CallsThe next major feature is function calls. Once we have those, we will be able to run a trivialThnad program.
  • 24. {:funcall => {:name => baz, :args => [ {:arg => {:number => 42}}]}} {:arg => {:name => foo}}]}} rootbaz(42, foo) :funcall :name :args "baz" :arg :arg :number :name "42" "foo"We’re going to move a little faster here, to leave time for Rubinius. Here, we want totransform this source code into this Ruby data structure representing a function call.
  • 25. Thnad::Funcall.new foo, [Thnad::Number.new(42)] root Thnad::Funcall :name :args "foo" Thnad::Number :value 42Now, we want to transform generic Ruby data structures into purpose-built ones that we canattach bytecode-emitting behavior to.
  • 26. class Funcall < Struct.new :name, :args def eval(context, builder) args.each { |a| a.eval(context, builder) } types = [builder.int] * (args.length + 1) builder.invokestatic builder.class_builder, name, types end endThe bytecode for a function call is really simple in BiteScript. All functions in Thnad are staticmethods on a single class.
  • 27. 3. ConditionalsThe first two features we’ve defined are enough to write simple programs like print(42). Thenext two features will let us add conditionals and custom functions.
  • 28. {:cond => {:number => 0}, :if_true => {:body => {:number => 42}}, :if_false => {:body => {:number => 667}}} if (0) { 42 root } else { 667 :cond :if_true :if_false } :number :body :body "0" :number :number "42" "667"A conditional consists of the “if” keyword, followed by a body of code inside braces, then the“else” keyword, followed by another body of code in braces.
  • 29. Thnad::Conditional.new Thnad::Number.new(0), Thnad::Number.new(42), Thnad::Number.new(667) root Thnad::Conditional :cond :if_true :if_false Thnad::Number Thnad::Number Thnad::Number :value :value :value 0 42 667Here’s the transformed tree representing a set of custom Ruby classes.
  • 30. class Conditional < Struct.new :cond, :if_true, :if_false def eval(context, builder) cond.eval context, builder builder.ifeq :else if_true.eval context, builder builder.goto :endif builder.label :else if_false.eval context, builder builder.label :endif endendThe bytecode emitter for conditionals has a new twist. The Conditional struct points to threeother Thnad nodes. It needs to eval() them at the right time to emit their bytecode inbetween all the zero checks and gotos.
  • 31. 4. Function DefinitionsOn to the final piece of Thnad: defining new functions.
  • 32. {:func => {:name => foo}, :params => {:param => {:name => x}}, :body => {:number => 5}}function foo(x) { root 5 } :func :params :body :name :param :number "foo" :name "5" "x"A function definition looks a lot like a function call, but with a body attached to it.
  • 33. Thnad::Function.new foo, [Thnad::Name.new(x)], Thnad::Number.new(5) root Thnad::Function :name :params :body "foo" Thnad::Name Thnad::Number :name :value "x" 5Here’s the transformation we want to perform for this language feature.
  • 34. class Function < Struct.new :name, :params, :body def eval(context, builder) param_names = [params].flatten.map(&:name) context[:params] = param_names types = [builder.int] * (param_names.count + 1) builder.public_static_method(self.name, [], *types) do |method| self.body.eval(context, method) method.ireturn end endendSince all Thnad parameters and return types are integers, emitting a function definition isreally easy. We count the parameters so that we can give the JVM a correct signature. Then,we just pass a block to the public_static_method helper, a feature of BiteScript that willinspire the Rubinius work later on.
  • 35. CompilerWe’ve seen how to generate individual chunks of bytecode; how do they all get stitchedtogether into a .class file?
  • 36. builder = BiteScript::FileBuilder.build(@filename) do public_class classname, object do |klass| # ... klass.public_static_method main, [], void, string[] do |method| context = Hash.new exprs.each do |e| e.eval(context, method) end method.returnvoid end endendHere’s the core of class generation. We output a standard Java main() function...
  • 37. builder = BiteScript::FileBuilder.build(@filename) do public_class classname, object do |klass| # ... klass.public_static_method main, [], void, string[] do |method| context = Hash.new exprs.each do |e| e.eval(context, method) end method.returnvoid end endend...inside which we eval() our Thnad expressions (not counting function definitions) one byone.
  • 38. Built-ins plus, minus, times, eq, printThnad ships with a few basic arithmetic operations, plus a print() function. Let’s look at oneof those now.
  • 39. public_static_method minus, [], int, int, int do iload 0 iload 1 isub ireturn endHere’s the definition of minus(). It just pushes its two arguments onto the stack and thensubtracts them. The rest of the built-ins are nearly identical to this one, so we won’t showthem here.
  • 40. II. Enter the FrenemySo thats a whirlwind tour of Thnad. Last year, I was telling someone about this project—itwas either Shane Becker or Brian Ford, I think—and he said,...
  • 41. Rubinius...“Hey, you should port this to Rubinius!” I thought, “Hmm, why not? Sounds fun.” Let’stake a look at this other runtime that has sprung up as a rival for Thnad’s affections.
  • 42. Ruby in Ruby • As much as performance allows • Initially 100%, now around half (?) • Core in C++ / LLVM • Tons in Ruby: primitives, parser, bytecodeThe goal of Rubinius is to implement Ruby in Ruby as much as performance allows. Quite alot of functionality you’d think would need to be in C is actually in Ruby.
  • 43. RubySpec, FFI Brought to you by Rubinius (Thank you!)We have Rubinius to thank for the executable Ruby specification that all Rubies are nowjudged against, and for the excellent foreign-function interface that lets you call C code in away that’s compatible with at least four Rubies.
  • 44. Looking Inside Your CodeRubinius also has tons of mechanisms for looking inside your code, which was very helpfulwhen I needed to learn what bytecode I’d need to output to accomplish a particular task inThnad.
  • 45. class Example def add(a, b) a + b end endFor example, with this class,...
  • 46. AST $ rbx compile -S example.rb [:script, [:class, :Example, nil, [:scope, [:block, [:defn, :add, [:args, :a, :b], [:scope, [:block, [:call, [:lvar, :a], :+, [:arglist, [:lvar, :b]]]]]]]]]]...you can get a Lisp-like representation of the syntax tree,...
  • 47. Bytecode $ rbx compile -B example.rb ... ================= :add ================= Arguments: 2 required, 2 total Locals: 2: a, b Stack size: 4 Lines to IP: 2: -1..-1, 3: 0..6 0000: push_local 0 # a 0002: push_local 1 # b 0004: meta_send_op_plus :+ 0006: ret ----------------------------------------...or a dump of the actual bytecode for the Rubinius VM.
  • 48. “Ruby Platform Throwdown” Moderated by Dr Nic, 2011 vimeo/26773441For more on the similarities and differences between Rubinius and JRuby, see the throwdownvideo moderated by Dr Nic.
  • 49. III: Thnad’s RevengeNow that we’ve gotten to know Rubinius a little...
  • 50. Let’s port Thnad to Rubinius!...let’s see what it would take to port Thnad to it.
  • 51. photo: JSConf US Our Guide Through the Wilderness @brixenBrian Ford was a huge help during this effort, answering tons of my “How do I...?” questionsin an awesome Socratic way (“Let’s take a look at the Generator class source code....”)
  • 52. Same parser Same AST transformation Different bytecode (But similar bytecode ideas)Because the Thnad syntax is unchanged, we can reuse the parser and syntax transformation.All we need to change is the bytecode output. And even that’s not drastically different.
  • 53. Thnad’s Four Features, RevisitedLet’s go back through Thnad’s four features in the context of Rubinius.
  • 54. 1. Names and NumbersFirst, function parameters and integers.
  • 55. JVM RBX # Numbers: # Numbers: ldc 42 push 42 # Names: # Names: iload 0 push_local 0See how similar the JVM and Rubinius bytecode is for these basic features?
  • 56. class Number < Struct.new :value def eval(context, builder) builder.push value endendAll we had to change was the name of the opcode both for numbers...
  • 57. class Name < Struct.new :name def eval(context, builder) param_names = context[:params] || [] position = param_names.index(name) raise "Unknown parameter #{name}" unless position builder.push_local position end end...and for parameter names.
  • 58. 2. Function CallsFunction calls were similarly easy.
  • 59. JVM RBX push_const :Exampleldc 42 push 42ldc 1 push 1invokestatic #2; //Method send_stack #<CM>, 2 //add:(II)IIn Rubinius, there are no truly static methods. We are calling the method on a Ruby object—namely, an entire Ruby class. So we have to push the name of that class onto the stack first.The other big difference is that in Rubinius, we don’t just push the method name onto thestack—we push a reference to the compiled code itself. Fortunately, there’s a helper methodto make this look more Bitescript-like.
  • 60. class Funcall < Struct.new :name, :args def eval(context, builder) builder.push_const :Thnad args.each { |a| a.eval(context, builder) } builder.allow_private builder.send name.to_sym, args.length end endHere’s how that difference affects the bytecode. Notice the allow_private() call? I’m not sureexactly why we need this. It may be an “onion in the varnish,” a reference to a story by PrimoLevi in _The Periodic Table_.
  • 61. flickr/black-and-white-prints/1366095561flickr/ianfuller/76775606In the story, the workers at a varnish factory wondered why the recipe called for an onion.They couldn’t work out chemically why it would be needed, but it had always been one of theingredients. It turned out that it was just a crude old-school thermometer: when the onionsizzled, the varnish was ready.
  • 62. 3. ConditionalsOn to conditionals.
  • 63. JVM RBX 0: iconst_0 37: push 0 1: ifeq 9 38: push 0 4: bipush 42 39: send :== 6: goto 12 41: goto_if_false 47 9: sipush 667 43: push 42 12: ... 45: goto 49 47: push 667 49: ...Here, the JVM directly supports an “if equal to zero” opcode, whereas in Rubinius we have toexplicitly compare the item on the stack with zero.
  • 64. class Conditional < Struct.new :cond, :if_true, :if_false def eval(context, builder) else_label = builder.new_label endif_label = builder.new_label cond.eval context, builder builder.push 0 builder.send :==, 1 builder.goto_if_true else_label if_true.eval context, builder builder.goto endif_label else_label.set! if_false.eval context, builder endif_label.set! endendLabels are also a little different in Rubinius, too; here’s what the bytecode for conditionalslooks like now.
  • 65. 4. Function DefinitionsThe trickiest part to implement was function calls.
  • 66. JVM RBXpublic int add(int, int); push_rubinius iload_1 push :add iload_2 push #<CM> iadd push_scope ireturn push_self push_const :Thnad send :attach_method, 4Remember that in Ruby, there’s no compile-time representation of a class. So rather thanemitting a class definition, we emit code that creates a class at runtime.
  • 67. class Function < Struct.new :name, :params, :body def eval(context, builder) param_names = [params].flatten.map(&:name) context[:params] = param_names # create a new Rubinius::Generator builder.begin_method name.to_sym, params.count self.body.eval(context, builder.current_method) builder.current_method.ret builder.end_method end endThe code to define a method in Rubinius requires spinning up a completely separatebytecode generator. I stuck all this hairy logic in a set of helpers to make it more BiteScript-like.
  • 68. class Rubinius::Generator def end_method # ... cm = @inner.package Rubinius::CompiledMethod push_rubinius push_literal inner.name push_literal cm push_scope push_const :Thnad send :attach_method, 4 pop end endHere’s the most interesting part of those helpers. After the function definition is compiled,we push it onto the stack and tell Rubinius to attach it to our class.
  • 69. CompilerHow does the compiled code make its way into a .rbc file?
  • 70. g = Rubinius::Generator.new # ... context = Hash.new exprs.each do |e| e.eval(context, g) end # ...As with JRuby, we create a bytecode generation object, then evaluate all the Thnadstatements into it.
  • 71. main = g.package Rubinius::CompiledMethod Rubinius::CompiledFile.dump main, @outname, Rubinius::Signature, 18Finally, we tell Rubinius to marshal the compiled code to a .rbc file.
  • 72. Runner (new!)That means we now need a small script to unmarshal that compiled code and run it. This isnew; on the Java runtime, we already have a runner: the java binary.
  • 73. #!/usr/bin/env rbx -rubygems (puts("Usage: #{} BINARY"); exit) if ARGV.empty? loader = Rubinius::CodeLoader.new(ARGV.first) method = loader.load_compiled_file( ARGV.first, Rubinius::Signature, 18) result = Rubinius.run_script(method)Here’s the entirety of the code to load and run a compiled Rubinius file.
  • 74. Built-insAs we’ve just seen, defining a function in Rubinius takes a lot of steps, even with helperfunctions to abstract away some of the hairiness.
  • 75. g.begin_method :minus, 2 g.current_method.push_local 0 g.current_method.push_local 1 g.current_method.send :-, 1 g.current_method.ret g.end_methodFor example, here’s the built-in minus() function. I wanted to avoid writing a bunch of these.
  • 76. function plus(a, b) { minus(a, minus(0, b)) }I realized that you could write plus() in Thnad instead, defining it in terms of minus.
  • 77. function times(a, b) { if (eq(b, 0)) { 0 } else { plus(a, times(a, minus(b, 1))) } }If you don’t care about bounds checking, you can also do times()...
  • 78. function eq(a, b) { if (minus(a, b)) { 0 } else { 1 } }...and if()!
  • 79. stdthnadlib?!? We have a standard library!That means we have a standard library! Doing the Rubinius implementation helped meimprove the JRuby version. I was able to go back and rip out most of the built-in functionsfrom that implementation.
  • 80. Thnad Online github/undees/thnad/tree/master github/undees/thnad/tree/rbxHere’s where you can download and play with either implementation.
  • 81. This has been a fantastic conference. Thank you to our hosts...
  • 82. Special Thanks Kaspar Schiess for Parslet Charles Nutter for BiteScript Ryan Davis and Aja Hammerly for Graph Brian Ford for guidance Our tireless conference organizers!...and to the makers of JRuby, Rubinius, Parslet, BiteScript, and everything else that made thisproject possible. Cheers!