Building Languages for the JVM - StarTechConf 2011
Building Languages
for the JVM
Charles Oliver Nutter
Friday, November 4, 2011
Me
• Charles Oliver Nutter
• JRuby and JVM guy
• “headius” on most services
• headius@headius.com
Friday, November 4, 2011
Who Was I?
• Java EE architect
• Successfully!
• Never wrote a parser
• Never wrote a compiler
• But I wanted to learn how...
Friday, November 4, 2011
You
• Java?
• Ruby?
• Python?
• C#?
• Other?
Friday, November 4, 2011
Why Create Languages?
• Nothing is perfect
• New problems need new solutions
• Language design can be fun
• Fame and fortune?
• Not really.
Friday, November 4, 2011
Why Impl a Language?
• To learn it?
• Sort of...
• To learn the target platform
• Definitely!
• Fame and fortune?
• Well...getting there...
Friday, November 4, 2011
Challenges
• Community
• Platform
• Specifications
• Resources
Friday, November 4, 2011
Community
• Investment in status quo
• Afraid to stand out
• Known quantities
• Everything else sucks
• Gotta get paid!
Friday, November 4, 2011
Platform
• Matching language semantics
• JVM designed around Java
• JVM hides underlying platform
• Challenging to use
• Not bad...C/++ would be way worse
• Community may hate it ;-)
Friday, November 4, 2011
Specifications
• Incomplete
• Ruby had none for years
• ...and no complete test suites
• Difficult to implement
• Low level features
• Single-implementation quirks
• Hard or impossible to optimize
Friday, November 4, 2011
Resources
• You gotta eat
• Not much money in language work
• Some parts are hard
• OSS is a necessity
Friday, November 4, 2011
Why JVM?
• Because I am lazy
• Because VMs are *hard*
• Because I can’t be awesome at everything
Friday, November 4, 2011
Ok, Why Really?
• Cross-platform
• Libraries
• Languages
• Memory management
• Tools
• OSS
Friday, November 4, 2011
Cross-platform
• OpenJDK: Linux, Windows, Solaris, OS X,
xBSD
• J9: Linux, zLinux, AS/400, ...
• HP: OpenVMS, HP/UX, ...
• Dalvik (Android): Linux on ARM, x86
Friday, November 4, 2011
Libraries
• For any need, a dozen libraries
• And a couple of them are good!
• Cross-platform
• Leading edge
Friday, November 4, 2011
Selection of languages
• Java
• Scala
• Clojure
• JRuby
• Mirah
• Jython, Groovy, Fantom, Kotlin, Ceylon, ...
Friday, November 4, 2011
Memory management
• Best GCs in the world
• Fastest object allocation
• Safe escape hatches like NIO
Friday, November 4, 2011
Ruby on the JVM
• All of Ruby’s power and beauty
• Solid VM underneath
• “Just another JVM language”
Friday, November 4, 2011
JVM Language
• Full interop with Java
• Tricky to do...
• Very rewarding for 99% case
• VM concerns solved
• No need to write a GC
• No need to write a JIT
• ...oh, but wait...
Friday, November 4, 2011
More than a JVM
language
• Use native code where JDK fails us
• Paper over ugly bits like CLASSPATH
• Matching Ruby semantics exactly*
• Push JVM forward too!
Friday, November 4, 2011
Playing with JRuby
• Simple IRB demo
• JRuby on Rails - see Jano’s talk tomorrow
• JRuby performance
• PotC (???)
Friday, November 4, 2011
Parser
• Port of MRI’s Bison grammar
• “Jay” parser generator for Java
• Hand-written lexer
• Nearly as fast as the C version
• ...once it gets going
Friday, November 4, 2011
system ~/projects/jruby $ jruby -y -e "1 + 1"
push state 0 value null
reduce tate 0 uncover 0
s rule (1) $$1 :
goto from state 0 to 2
push state 2 value null
lex tate 2 reading tIDENTIFIER value Token { Value=load,
s
Position=file:/Users/headius/projects/jruby/lib/jruby.jar!/
jruby/kernel.rb:6}
shift from state 2 to 33
push state 33 value Token { Value=load, Position=file:/Users/
headius/projects/jruby/lib/jruby.jar!/jruby/kernel.rb:6}
lex tate 33 reading tSTRING_BEG value Token { Value=',
s
Position=file:/Users/headius/projects/jruby/lib/jruby.jar!/
jruby/kernel.rb:6}
reduce tate 33 uncover 2
s rule (487) operation : tIDENTIFIER
goto from state 2 to 62
push state 62 value Token { Value=load, Position=file:/Users/
headius/projects/jruby/lib/jruby.jar!/jruby/kernel.rb:6}
reduce tate 62 uncover 62 rule (252) $$6 :
s
Friday, November 4, 2011
system ~/projects/jruby $ jruby -y -e "1 + 1"
push state 0 value null
reduce tate 0 uncover 0
s rule (1) $$1 :
goto from state 0 to 2
push state 2 value null
lex tate 2 reading tIDENTIFIER value Token { Value=load,
s
Position=file:/Users/headius/projects/jruby/lib/jruby.jar!/
jruby/kernel.rb:6}
shift from state 2 to 33
You will never need this.
push state 33 value Token { Value=load, Position=file:/Users/
headius/projects/jruby/lib/jruby.jar!/jruby/kernel.rb:6}
lex tate 33 reading tSTRING_BEG value Token { Value=',
s
Position=file:/Users/headius/projects/jruby/lib/jruby.jar!/
jruby/kernel.rb:6}
reduce tate 33 uncover 2
s rule (487) operation : tIDENTIFIER
goto from state 2 to 62
push state 62 value Token { Value=load, Position=file:/Users/
headius/projects/jruby/lib/jruby.jar!/jruby/kernel.rb:6}
reduce tate 62 uncover 62 rule (252) $$6 :
s
Friday, November 4, 2011
public class RubyYaccLexer {
public static final Encoding UTF8_ENCODING = UTF8Encoding.INSTANCE;
public static final Encoding USASCII_ENCODING = USASCIIEncoding.INSTANCE;
public static final Encoding ASCII8BIT_ENCODING = ASCIIEncoding.INSTANCE;
private static ByteList END_MARKER = new ByteList(new byte[] {'_', 'E', 'N', 'D', '_', '_'});
private static ByteList BEGIN_DOC_MARKER = new ByteList(new byte[] {'b', 'e', 'g', 'i', 'n'});
private static ByteList END_DOC_MARKER = new ByteList(new byte[] {'e', 'n', 'd'});
private static final HashMap<String, Keyword> map;
static {
map = new HashMap<String, Keyword>();
map.put("end", Keyword.END);
map.put("else", Keyword.ELSE);
map.put("case", Keyword.CASE);
map.put("ensure", Keyword.ENSURE);
map.put("module", Keyword.MODULE);
map.put("elsif", Keyword.ELSIF);
map.put("def", Keyword.DEF);
map.put("rescue", Keyword.RESCUE);
map.put("not", Keyword.NOT);
map.put("then", Keyword.THEN);
map.put("yield", Keyword.YIELD);
map.put("for", Keyword.FOR);
map.put("self", Keyword.SELF);
map.put("false", Keyword.FALSE);
Friday, November 4, 2011
private int yylex() throws IOException {
int c;
boolean spaceSeen = false;
boolean commandState;
if (lex_strterm != null) {
int tok = lex_strterm.parseString(this, src);
if (tok == Tokens.tSTRING_END || tok == Tokens.tREGEXP_END) {
lex_strterm = null;
setState(LexState.EXPR_END);
}
return tok;
}
commandState = commandStart;
commandStart = false;
loop: for(;;) {
c = src.read();
switch(c) {
Friday, November 4, 2011
case '<':
return lessThan(spaceSeen);
case '>':
return greaterThan();
case '"':
return doubleQuote();
case '`':
return backtick(commandState);
case ''':
return singleQuote();
case '?':
return questionMark();
case '&':
return ampersand(spaceSeen);
case '|':
return pipe();
case '+':
return plus(spaceSeen);
Friday, November 4, 2011
private int lessThan(boolean spaceSeen) throws IOException {
int c = src.read();
if (c == '<' && lex_state != LexState.EXPR_DOT && lex_state !=
LexState.EXPR_CLASS &&
!isEND() && (!isARG() || spaceSeen)) {
int tok = hereDocumentIdentifier();
if (tok != 0) return tok;
}
determineExpressionState();
switch (c) {
case '=':
if ((c = src.read()) == '>') {
yaccValue = new Token("<=>", getPosition());
return Tokens.tCMP;
Friday, November 4, 2011
%%
program : {
lexer.setState(LexState.EXPR_BEG);
support.initTopLocalVariables();
} top_compstmt {
// ENEBO: Removed !compile_for_eval which probably is to reduce
warnings
if ($2 != null) {
/* last expression should not be void */
if ($2 instanceof BlockNode) {
support.checkUselessStatement
($<BlockNode>2.getLast());
} else {
support.checkUselessStatement($2);
}
}
support.getResult().setAST(support.addRootNode
($2, support.getPosition($2)));
}
Friday, November 4, 2011
stmt : kALIAS fitem {
lexer.setState(LexState.EXPR_FNAME);
} fitem {
$$ = support.newAlias($1.getPosition(), $2, $4);
}
| kALIAS tGVAR tGVAR {
$$ = new VAliasNode($1.getPosition(), (String)
$2.getValue(), (String) $3.getValue());
}
| kALIAS tGVAR tBACK_REF {
$$ = new VAliasNode($1.getPosition(), (String)
$2.getValue(), "$" + $<BackRefNode>3.getType());
}
| kALIAS tGVAR tNTH_REF {
support.yyerror("can't make alias for the number
variables");
}
| kUNDEF undef_list {
$$ = $2;
}
| stmt kIF_MOD expr_value {
$$ = new IfNode(support.getPosition($1),
support.getConditionNode($3), $1, null);
}
Friday, November 4, 2011
public Object yyparse (RubyYaccLexer yyLex) throws java.io.IOException {
if (yyMax <= 0) yyMax = 256;"" " // initial size
int yyState = 0, yyStates[] = new int[yyMax];" // state stack
Object yyVal = null, yyVals[] = new Object[yyMax];" // value stack
int yyToken = -1;" " " " " // current input
int yyErrorFlag = 0;" " " " // #tokens to shift
yyLoop: for (int yyTop = 0;; ++ yyTop) {
if (yyTop >= yyStates.length) {" " " // dynamically increase
int[] i = new int[yyStates.length+yyMax];
System.arraycopy(yyStates, 0, i, 0, yyStates.length);
yyStates = i;
Object[] o = new Object[yyVals.length+yyMax];
System.arraycopy(yyVals, 0, o, 0, yyVals.length);
yyVals = o;
}
yyStates[yyTop] = yyState;
yyVals[yyTop] = yyVal;
if (yydebug != null) yydebug.push(yyState, yyVal);
Friday, November 4, 2011
public class ASTCompiler {
private boolean isAtRoot = true;
public void compileBody(Node node, BodyCompiler context, boolean expr) {
Node oldBodyNode = currentBodyNode;
currentBodyNode = node;
compile(node, context, expr);
currentBodyNode = oldBodyNode;
}
public void compile(Node node, BodyCompiler context, boolean expr) {
if (node == null) {
if (expr) context.loadNil();
return;
}
switch (node.getNodeType()) {
case ALIASNODE:
compileAlias((AliasNode) node, context, expr);
break;
case ANDNODE:
compileAnd(node, context, expr);
break;
Friday, November 4, 2011
public void compileIf(Node node, BodyCompiler context, final boolean expr) {
final IfNode ifNode = (IfNode) node;
// optimizations if we know ahead of time it will always be true or false
Node actualCondition = ifNode.getCondition();
while (actualCondition instanceof NewlineNode) {
actualCondition = ((NewlineNode)actualCondition).getNextNode();
}
if (actualCondition.getNodeType().alwaysTrue()) {
// compile condition as non-expr and just compile "then" body
compile(actualCondition, context, false);
compile(ifNode.getThenBody(), context, expr);
} else if (actualCondition.getNodeType().alwaysFalse()) {
// always false or nil
compile(ifNode.getElseBody(), context, expr);
} else {
Friday, November 4, 2011
BranchCallback trueCallback = new BranchCallback() {
public void branch(BodyCompiler context) {
if (ifNode.getThenBody() != null) {
compile(ifNode.getThenBody(), context, expr);
} else {
if (expr) context.loadNil();
}
}
};
BranchCallback falseCallback = new BranchCallback() {
public void branch(BodyCompiler context) {
if (ifNode.getElseBody() != null) {
compile(ifNode.getElseBody(), context, expr);
} else {
if (expr) context.loadNil();
}
}
};
// normal
compile(actualCondition, context, true);
context.performBooleanBranch(trueCallback, falseCallback);
}
Friday, November 4, 2011
public abstract class BaseBodyCompiler implements BodyCompiler {
protected SkinnyMethodAdapter method;
protected VariableCompiler variableCompiler;
protected InvocationCompiler invocationCompiler;
protected int argParamCount;
protected Label[] currentLoopLabels;
protected Label scopeStart = new Label();
protected Label scopeEnd = new Label();
protected Label redoJump;
protected boolean inNestedMethod = false;
private int lastLine = -1;
private int lastPositionLine = -1;
protected StaticScope scope;
protected ASTInspector inspector;
protected String methodName;
protected String rubyName;
protected StandardASMCompiler script;
Friday, November 4, 2011
public void performBooleanBranch(BranchCallback trueBranch,
BranchCallback falseBranch) {
Label afterJmp = new Label();
Label falseJmp = new Label();
// call isTrue on the result
isTrue();
method.ifeq(falseJmp); // EQ == 0 (i.e. false)
trueBranch.branch(this);
method.go_to(afterJmp);
// FIXME: optimize for cases where we have no false branch
method.label(falseJmp);
falseBranch.branch(this);
method.label(afterJmp);
}
Friday, November 4, 2011