SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
Building Languages for the JVM - StarTechConf 2011
Building Languages for the JVM - StarTechConf 2011
1.
Building Languages
for the JVM
Charles Oliver Nutter
Friday, November 4, 2011
2.
Me
• Charles Oliver Nutter
• JRuby and JVM guy
• “headius” on most services
• headius@headius.com
Friday, November 4, 2011
3.
Who Was I?
• Java EE architect
• Successfully!
• Never wrote a parser
• Never wrote a compiler
• But I wanted to learn how...
Friday, November 4, 2011
4.
You
• Java?
• Ruby?
• Python?
• C#?
• Other?
Friday, November 4, 2011
5.
Why Create Languages?
• Nothing is perfect
• New problems need new solutions
• Language design can be fun
• Fame and fortune?
• Not really.
Friday, November 4, 2011
6.
Why Impl a Language?
• To learn it?
• Sort of...
• To learn the target platform
• Definitely!
• Fame and fortune?
• Well...getting there...
Friday, November 4, 2011
7.
Challenges
• Community
• Platform
• Specifications
• Resources
Friday, November 4, 2011
8.
Community
• Investment in status quo
• Afraid to stand out
• Known quantities
• Everything else sucks
• Gotta get paid!
Friday, November 4, 2011
9.
Platform
• Matching language semantics
• JVM designed around Java
• JVM hides underlying platform
• Challenging to use
• Not bad...C/++ would be way worse
• Community may hate it ;-)
Friday, November 4, 2011
10.
Specifications
• Incomplete
• Ruby had none for years
• ...and no complete test suites
• Difficult to implement
• Low level features
• Single-implementation quirks
• Hard or impossible to optimize
Friday, November 4, 2011
11.
Resources
• You gotta eat
• Not much money in language work
• Some parts are hard
• OSS is a necessity
Friday, November 4, 2011
12.
Why JVM?
• Because I am lazy
• Because VMs are *hard*
• Because I can’t be awesome at everything
Friday, November 4, 2011
21.
Ruby on the JVM
• All of Ruby’s power and beauty
• Solid VM underneath
• “Just another JVM language”
Friday, November 4, 2011
22.
JVM Language
• Full interop with Java
• Tricky to do...
• Very rewarding for 99% case
• VM concerns solved
• No need to write a GC
• No need to write a JIT
• ...oh, but wait...
Friday, November 4, 2011
23.
More than a JVM
language
• Use native code where JDK fails us
• Paper over ugly bits like CLASSPATH
• Matching Ruby semantics exactly*
• Push JVM forward too!
Friday, November 4, 2011
24.
Playing with JRuby
• Simple IRB demo
• JRuby on Rails - see Jano’s talk tomorrow
• JRuby performance
• PotC (???)
Friday, November 4, 2011
27.
Parser
• Port of MRI’s Bison grammar
• “Jay” parser generator for Java
• Hand-written lexer
• Nearly as fast as the C version
• ...once it gets going
Friday, November 4, 2011
28.
system ~/projects/jruby $ jruby -y -e "1 + 1"
push state 0 value null
reduce tate 0 uncover 0
s rule (1) $$1 :
goto from state 0 to 2
push state 2 value null
lex tate 2 reading tIDENTIFIER value Token { Value=load,
s
Position=file:/Users/headius/projects/jruby/lib/jruby.jar!/
jruby/kernel.rb:6}
shift from state 2 to 33
push state 33 value Token { Value=load, Position=file:/Users/
headius/projects/jruby/lib/jruby.jar!/jruby/kernel.rb:6}
lex tate 33 reading tSTRING_BEG value Token { Value=',
s
Position=file:/Users/headius/projects/jruby/lib/jruby.jar!/
jruby/kernel.rb:6}
reduce tate 33 uncover 2
s rule (487) operation : tIDENTIFIER
goto from state 2 to 62
push state 62 value Token { Value=load, Position=file:/Users/
headius/projects/jruby/lib/jruby.jar!/jruby/kernel.rb:6}
reduce tate 62 uncover 62 rule (252) $$6 :
s
Friday, November 4, 2011
29.
system ~/projects/jruby $ jruby -y -e "1 + 1"
push state 0 value null
reduce tate 0 uncover 0
s rule (1) $$1 :
goto from state 0 to 2
push state 2 value null
lex tate 2 reading tIDENTIFIER value Token { Value=load,
s
Position=file:/Users/headius/projects/jruby/lib/jruby.jar!/
jruby/kernel.rb:6}
shift from state 2 to 33
You will never need this.
push state 33 value Token { Value=load, Position=file:/Users/
headius/projects/jruby/lib/jruby.jar!/jruby/kernel.rb:6}
lex tate 33 reading tSTRING_BEG value Token { Value=',
s
Position=file:/Users/headius/projects/jruby/lib/jruby.jar!/
jruby/kernel.rb:6}
reduce tate 33 uncover 2
s rule (487) operation : tIDENTIFIER
goto from state 2 to 62
push state 62 value Token { Value=load, Position=file:/Users/
headius/projects/jruby/lib/jruby.jar!/jruby/kernel.rb:6}
reduce tate 62 uncover 62 rule (252) $$6 :
s
Friday, November 4, 2011
30.
public class RubyYaccLexer {
public static final Encoding UTF8_ENCODING = UTF8Encoding.INSTANCE;
public static final Encoding USASCII_ENCODING = USASCIIEncoding.INSTANCE;
public static final Encoding ASCII8BIT_ENCODING = ASCIIEncoding.INSTANCE;
private static ByteList END_MARKER = new ByteList(new byte[] {'_', 'E', 'N', 'D', '_', '_'});
private static ByteList BEGIN_DOC_MARKER = new ByteList(new byte[] {'b', 'e', 'g', 'i', 'n'});
private static ByteList END_DOC_MARKER = new ByteList(new byte[] {'e', 'n', 'd'});
private static final HashMap<String, Keyword> map;
static {
map = new HashMap<String, Keyword>();
map.put("end", Keyword.END);
map.put("else", Keyword.ELSE);
map.put("case", Keyword.CASE);
map.put("ensure", Keyword.ENSURE);
map.put("module", Keyword.MODULE);
map.put("elsif", Keyword.ELSIF);
map.put("def", Keyword.DEF);
map.put("rescue", Keyword.RESCUE);
map.put("not", Keyword.NOT);
map.put("then", Keyword.THEN);
map.put("yield", Keyword.YIELD);
map.put("for", Keyword.FOR);
map.put("self", Keyword.SELF);
map.put("false", Keyword.FALSE);
Friday, November 4, 2011
32.
private int yylex() throws IOException {
int c;
boolean spaceSeen = false;
boolean commandState;
if (lex_strterm != null) {
int tok = lex_strterm.parseString(this, src);
if (tok == Tokens.tSTRING_END || tok == Tokens.tREGEXP_END) {
lex_strterm = null;
setState(LexState.EXPR_END);
}
return tok;
}
commandState = commandStart;
commandStart = false;
loop: for(;;) {
c = src.read();
switch(c) {
Friday, November 4, 2011
33.
case '<':
return lessThan(spaceSeen);
case '>':
return greaterThan();
case '"':
return doubleQuote();
case '`':
return backtick(commandState);
case ''':
return singleQuote();
case '?':
return questionMark();
case '&':
return ampersand(spaceSeen);
case '|':
return pipe();
case '+':
return plus(spaceSeen);
Friday, November 4, 2011
34.
private int lessThan(boolean spaceSeen) throws IOException {
int c = src.read();
if (c == '<' && lex_state != LexState.EXPR_DOT && lex_state !=
LexState.EXPR_CLASS &&
!isEND() && (!isARG() || spaceSeen)) {
int tok = hereDocumentIdentifier();
if (tok != 0) return tok;
}
determineExpressionState();
switch (c) {
case '=':
if ((c = src.read()) == '>') {
yaccValue = new Token("<=>", getPosition());
return Tokens.tCMP;
Friday, November 4, 2011
35.
%%
program : {
lexer.setState(LexState.EXPR_BEG);
support.initTopLocalVariables();
} top_compstmt {
// ENEBO: Removed !compile_for_eval which probably is to reduce
warnings
if ($2 != null) {
/* last expression should not be void */
if ($2 instanceof BlockNode) {
support.checkUselessStatement
($<BlockNode>2.getLast());
} else {
support.checkUselessStatement($2);
}
}
support.getResult().setAST(support.addRootNode
($2, support.getPosition($2)));
}
Friday, November 4, 2011
36.
stmt : kALIAS fitem {
lexer.setState(LexState.EXPR_FNAME);
} fitem {
$$ = support.newAlias($1.getPosition(), $2, $4);
}
| kALIAS tGVAR tGVAR {
$$ = new VAliasNode($1.getPosition(), (String)
$2.getValue(), (String) $3.getValue());
}
| kALIAS tGVAR tBACK_REF {
$$ = new VAliasNode($1.getPosition(), (String)
$2.getValue(), "$" + $<BackRefNode>3.getType());
}
| kALIAS tGVAR tNTH_REF {
support.yyerror("can't make alias for the number
variables");
}
| kUNDEF undef_list {
$$ = $2;
}
| stmt kIF_MOD expr_value {
$$ = new IfNode(support.getPosition($1),
support.getConditionNode($3), $1, null);
}
Friday, November 4, 2011
37.
public Object yyparse (RubyYaccLexer yyLex) throws java.io.IOException {
if (yyMax <= 0) yyMax = 256;"" " // initial size
int yyState = 0, yyStates[] = new int[yyMax];" // state stack
Object yyVal = null, yyVals[] = new Object[yyMax];" // value stack
int yyToken = -1;" " " " " // current input
int yyErrorFlag = 0;" " " " // #tokens to shift
yyLoop: for (int yyTop = 0;; ++ yyTop) {
if (yyTop >= yyStates.length) {" " " // dynamically increase
int[] i = new int[yyStates.length+yyMax];
System.arraycopy(yyStates, 0, i, 0, yyStates.length);
yyStates = i;
Object[] o = new Object[yyVals.length+yyMax];
System.arraycopy(yyVals, 0, o, 0, yyVals.length);
yyVals = o;
}
yyStates[yyTop] = yyState;
yyVals[yyTop] = yyVal;
if (yydebug != null) yydebug.push(yyState, yyVal);
Friday, November 4, 2011
59.
public class ASTCompiler {
private boolean isAtRoot = true;
public void compileBody(Node node, BodyCompiler context, boolean expr) {
Node oldBodyNode = currentBodyNode;
currentBodyNode = node;
compile(node, context, expr);
currentBodyNode = oldBodyNode;
}
public void compile(Node node, BodyCompiler context, boolean expr) {
if (node == null) {
if (expr) context.loadNil();
return;
}
switch (node.getNodeType()) {
case ALIASNODE:
compileAlias((AliasNode) node, context, expr);
break;
case ANDNODE:
compileAnd(node, context, expr);
break;
Friday, November 4, 2011
60.
public void compileIf(Node node, BodyCompiler context, final boolean expr) {
final IfNode ifNode = (IfNode) node;
// optimizations if we know ahead of time it will always be true or false
Node actualCondition = ifNode.getCondition();
while (actualCondition instanceof NewlineNode) {
actualCondition = ((NewlineNode)actualCondition).getNextNode();
}
if (actualCondition.getNodeType().alwaysTrue()) {
// compile condition as non-expr and just compile "then" body
compile(actualCondition, context, false);
compile(ifNode.getThenBody(), context, expr);
} else if (actualCondition.getNodeType().alwaysFalse()) {
// always false or nil
compile(ifNode.getElseBody(), context, expr);
} else {
Friday, November 4, 2011
61.
BranchCallback trueCallback = new BranchCallback() {
public void branch(BodyCompiler context) {
if (ifNode.getThenBody() != null) {
compile(ifNode.getThenBody(), context, expr);
} else {
if (expr) context.loadNil();
}
}
};
BranchCallback falseCallback = new BranchCallback() {
public void branch(BodyCompiler context) {
if (ifNode.getElseBody() != null) {
compile(ifNode.getElseBody(), context, expr);
} else {
if (expr) context.loadNil();
}
}
};
// normal
compile(actualCondition, context, true);
context.performBooleanBranch(trueCallback, falseCallback);
}
Friday, November 4, 2011
62.
public abstract class BaseBodyCompiler implements BodyCompiler {
protected SkinnyMethodAdapter method;
protected VariableCompiler variableCompiler;
protected InvocationCompiler invocationCompiler;
protected int argParamCount;
protected Label[] currentLoopLabels;
protected Label scopeStart = new Label();
protected Label scopeEnd = new Label();
protected Label redoJump;
protected boolean inNestedMethod = false;
private int lastLine = -1;
private int lastPositionLine = -1;
protected StaticScope scope;
protected ASTInspector inspector;
protected String methodName;
protected String rubyName;
protected StandardASMCompiler script;
Friday, November 4, 2011
63.
public void performBooleanBranch(BranchCallback trueBranch,
BranchCallback falseBranch) {
Label afterJmp = new Label();
Label falseJmp = new Label();
// call isTrue on the result
isTrue();
method.ifeq(falseJmp); // EQ == 0 (i.e. false)
trueBranch.branch(this);
method.go_to(afterJmp);
// FIXME: optimize for cases where we have no false branch
method.label(falseJmp);
falseBranch.branch(this);
method.label(afterJmp);
}
Friday, November 4, 2011