Hacking parse.y
Tatsuhiro Ujihisa
ujihisa@gmail.com
http://ujihisa.blogspot.com/
@ujm
hi
•I'm from Japan
DISCLAIMER
•This presentation is
not for super rubyists
or ruby committers,
but for ordinary
programmers.
Hacking parse.y
•Ruby's syntax
Hacking parse.y
Fixing ruby parser to understand ruby
•Introducing new syntax
• {:key :-) "value"}
• 'symbol
• ++i
• def A...
MRI Inside
•MRI (Matz Ruby Implementation)
•$ ruby -v
ruby 1.9.2dev (2009-11-19 trunk 25862) [i386-darwin9.8.0]
•Written i...
ruby 1.8 vs 1.9
•~1.8
• Parser: parse.y
• Evaluator: eval.c
•1.9~
• Parser: parse.y
• Evaluator:YARV (vm*.c)
Matz said
•Ugly: eval.c and parse.y
RubyConf2006
•Now the original evaluator
was all replaced withYARV
MRI Parser
•MRI uses yacc
(parser generator for C)
•parse.y
bison -d -o y.tab.c parse.y
sed -f ./tool/ytab.sed -e "/^#/s!y...
parse.y
•One of the darkest side
•$ wc -l *{c,h,y} | sort -n
...
9961 io.c
10474 parse.y
16367 parse.c # (automatically ge...
(Broad) Parser
•Lexer (yylex)
•Bytes → Symbols
•Parser (yyparse)
•Symbols → Syntax Tree
Tokens in Lexer
%token tUPLUS /* unary+ */
%token tUMINUS /* unary- */
%token tPOW /* ** */
%token tCMP /* <=> */
%token t...
(detour)
n MRI: parse.y (10474 lines)
n JRuby: src/org/jruby/{parser, lexer}/* (24983 lines)
n parser/DefaultRubyParser.y ...
Case 1:
:-)
•Hash literal
{:key => 'value'}
{:key :-) 'value'}
•:-) is just an alias of =>
Mastering “Colon”
Colons in Ruby
•A::B, ::C
•:symbol, :"sy-m-bol"
•a ? b : c
•{a: b}
•when 1: something (in 1.8)
static int
parser_yylex(struct parser_params *parser) {
...
switch (c = nextc()) {
...
case '#': /* it's a comment */
...
...
How does parser deal
with colon?
•:: → tCOLON2 or tCOLON3
•tCOLON2 Net::URI
•tCOLON3 ::Kernel
enum lex_state_e {
EXPR_BEG, /* ignore newline, +/- is a sign. */
EXPR_END, /* newline significant, +/- is an operator. *
...
case ':':
c = nextc();
if (c == ':') {
if (IS_BEG() ||
lex_state == EXPR_CLASS ||
(IS_ARG() && space_seen)) {
lex_state = ...
...
if (lex_state == EXPR_END ||
lex_state == EXPR_ENDARG ||
(c != -1 && ISSPACE(c))) {
pushback(c);
lex_state = EXPR_BEG;...
How does parser deal
with colon? (summary)
•:: → tCOLON2 or tCOLON3
•EXPR_END or →: (else)
•otherwise → tSYMBEG
•:' → str_...
So,
•:-) → tASSOC
•:: → tCOLON2 or tCOLON3
•EXPR_END or →: (else)
•otherwise → tSYMBEG
•:' → str_ssym
•:" → str_dsym
:-)
DISCLAIMER
•This presentation is
not for super rubyists
or ruby committers,
but for ordinary
programmers.
Case 2:
Lisp Like Symbol
•Symbol Literal
:vancouver
'vancouver
•Ad-hoc
p :a, :b
p 'a, 'b
Single Quote
(in parser_yylex)
...
case ''':
lex_strterm = NEW_STRTERM(str_squote, ''', 0);
return tSTRING_BEG;
...
Single Quote
(in parser_yylex)
...
case ''':
if (??? condition ???) {
lex_state = EXPR_FNAME;
return tSYMBEG;
}
lex_strter...
(loop
(lambda (p 'good)))
Case3: Pre
Incremental Operator
•++i
•i = i.succ
(NOT i = i + 1)
Lexer
@@ -685,6 +685,7 @@ static void
token_info_pop(struct parser_params*, const
char *token);
%type <val> program reswor...
regenerate id.h
•id.h is automatically
generated by parse.y in make
•$ rm id.h
$ make
parser example
variable : tIDENTIFIER
| tIVAR
| tGVAR
| tCONSTANT
| tCVAR
| keyword_nil {ifndef_ripper($$ = keyword_nil);}...
lhs : variable
{
/*%%%*/
if (!($$ = assignable($1, 0))) $$ = NEW_BEGIN(0);
/*%
$$ = dispatch1(var_field, $1);
%*/
}
| prim...
BNF (part)
program : compstmt
compstmt : stmts opt_terms
stmts : none
| stmt
| stmts terms stmt
stmt : kALIAS fitem fitem
...
Assign
stmt : ...
| mlhs '=' command_call
{
/*%%%*/
value_expr($3);
$1->nd_value = $3;
$$ = $1;
/*%
$$ = dispatch2(massign...
mlhs
mlhs: mlhs_basic | ...
mlhs_basic: mlhs_head | ...
mlhs_head: mlhs_item ',' | ...
mlhs_item: mlhs_node | ...
mlhs_nod...
Method call
block_command : block_call
| block_call '.' operation2 command_args
{
/*%%%*/
$$ = NEW_CALL($1, $3, $4);
/*%
$...
Mix!
var_ref: ...
| tINCR variable
{
/*%%%*/
$$ = assignable($2, 0);
$$->nd_value = NEW_CALL(gettable($$->nd_vid),
rb_inte...
++ruby
Case 4:
def A#b
•A#b
instance method b of class A
•A.b
class method b of class A
A#b
class A
def b
...
end
end
def A.b
...
end
A#b
def A#b
...
end
def A.b
...
end
#
(in parser_yylex)
case '#': /* it's a comment */
/* no magic_comment in shebang line */
if (!parser_magic_comment(parser...
#
(in parser_yylex)
case '#': /* it's a comment */
c = nextc();
pushback(c);
if(lex_state == EXPR_END && ISALNUM(c)) retur...
Primary
primary: literal | ...
| k_def singleton dot_or_colon {lex_state = EXPR_FNAME;} fname
{
in_single++;
lex_state = E...
| k_def cname '#' {lex_state = EXPR_FNAME;} fname
{
$<id>$ = cur_mid;
cur_mid = $5;
in_def++;
/*%%%*/
local_push(0);
/*%
%...
Reference
Rubyソースコード完全解説
青木峰郎 著、まつもとゆ
きひろ 監修
Minero AOKI,Yukihiro
MATSUMOTO
"Ruby Hacking Guide"
HTMLVersion is available
Reference
•My blog
http://ujihisa.blogspot.com
•All patches I showed are there
end
Appendix:
Imaginary Numbers
•Matz wrote a patch in
[ruby-dev:38843]
•translation:
[ruby-core:24730]
•It won't be accepted
Appendix:
Imaginary Numbers
> 3i
=> (0 + 3i)
> 3i.class
=> Complex
Appendix
•{you <3 ruby}
•f(x, y) = z
(like f[x, y] = z as f.[]=(x, y, z))
•Annotations!
Hacking parse.y (RubyConf 2009)
Upcoming SlideShare
Loading in...5
×

Hacking parse.y (RubyConf 2009)

3,846

Published on

Nov 19, 2009

Published in: Technology, Design
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,846
On Slideshare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
29
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

Hacking parse.y (RubyConf 2009)

  1. 1. Hacking parse.y Tatsuhiro Ujihisa ujihisa@gmail.com http://ujihisa.blogspot.com/ @ujm
  2. 2. hi •I'm from Japan
  3. 3. DISCLAIMER •This presentation is not for super rubyists or ruby committers, but for ordinary programmers.
  4. 4. Hacking parse.y •Ruby's syntax
  5. 5. Hacking parse.y Fixing ruby parser to understand ruby •Introducing new syntax • {:key :-) "value"} • 'symbol • ++i • def A#b(c) • {|x| x * 2 }
  6. 6. MRI Inside •MRI (Matz Ruby Implementation) •$ ruby -v ruby 1.9.2dev (2009-11-19 trunk 25862) [i386-darwin9.8.0] •Written in C •array.c, vm.c, gc.c, etc...
  7. 7. ruby 1.8 vs 1.9 •~1.8 • Parser: parse.y • Evaluator: eval.c •1.9~ • Parser: parse.y • Evaluator:YARV (vm*.c)
  8. 8. Matz said •Ugly: eval.c and parse.y RubyConf2006 •Now the original evaluator was all replaced withYARV
  9. 9. MRI Parser •MRI uses yacc (parser generator for C) •parse.y bison -d -o y.tab.c parse.y sed -f ./tool/ytab.sed -e "/^#/s!y.tab.c! parse.c!" y.tab.c > parse.c.new ...
  10. 10. parse.y •One of the darkest side •$ wc -l *{c,h,y} | sort -n ... 9961 io.c 10474 parse.y 16367 parse.c # (automatically generated) 188656 total
  11. 11. (Broad) Parser •Lexer (yylex) •Bytes → Symbols •Parser (yyparse) •Symbols → Syntax Tree
  12. 12. Tokens in Lexer %token tUPLUS /* unary+ */ %token tUMINUS /* unary- */ %token tPOW /* ** */ %token tCMP /* <=> */ %token tEQ /* == */ %token tEQQ /* === */ %token tNEQ /* != */ %token tGEQ /* >= */ %token tLEQ /* <= */ %token tANDOP tOROP /* && and || */ %token tMATCH tNMATCH/* =~ and !~ */ %token tDOT2 tDOT3 /* .. and ... */ %token tAREF tASET /* [] and []= */ %token tLSHFT tRSHFT /* << and >> */ %token tCOLON2 /* :: */ %token tCOLON3 /* :: at EXPR_BEG */ %token <id> tOP_ASGN /* +=, -= et %token tASSOC /* => */ %token tLPAREN /* ( */ %token tLPAREN_ARG /* ( */ %token tRPAREN /* ) */ %token tLBRACK /* [ */ %token tLBRACE /* { */ %token tLBRACE_ARG /* { */ %token tSTAR /* * */ %token tAMPER /* & */ %token tLAMBDA /* -> */ %token tSYMBEG tSTRING_BEG tXSTRING_ tWORDS_BEG tQWORDS_BEG %token tSTRING_DBEG tSTRING_DVAR tST
  13. 13. (detour) n MRI: parse.y (10474 lines) n JRuby: src/org/jruby/{parser, lexer}/* (24983 lines) n parser/DefaultRubyParser.y (1880 lines) parser/Ruby19Parser.y (2076 lines) n Rubinius: lib/ext/melbourne/grammer.y (5891 lines) and others
  14. 14. Case 1: :-) •Hash literal {:key => 'value'} {:key :-) 'value'} •:-) is just an alias of =>
  15. 15. Mastering “Colon”
  16. 16. Colons in Ruby •A::B, ::C •:symbol, :"sy-m-bol" •a ? b : c •{a: b} •when 1: something (in 1.8)
  17. 17. static int parser_yylex(struct parser_params *parser) { ... switch (c = nextc()) { ... case '#': /* it's a comment */ ... case ':': c = nextc(); if (c == ':') { if (IS_BEG() ||... ... } ... (about 1300 lines)
  18. 18. How does parser deal with colon? •:: → tCOLON2 or tCOLON3 •tCOLON2 Net::URI •tCOLON3 ::Kernel
  19. 19. enum lex_state_e { EXPR_BEG, /* ignore newline, +/- is a sign. */ EXPR_END, /* newline significant, +/- is an operator. * EXPR_ENDARG, /* ditto, and unbound braces. */ EXPR_ARG, /* newline significant, +/- is an operator. * EXPR_CMDARG, /* newline significant, +/- is an operator. * EXPR_MID, /* newline significant, +/- is an operator. * EXPR_FNAME, /* ignore newline, no reserved words. */ EXPR_DOT, /* right after `.' or `::', no reserved words EXPR_CLASS, /* immediate after `class', no here document. EXPR_VALUE /* alike EXPR_BEG but label is disallowed. */ }; lex_state
  20. 20. case ':': c = nextc(); if (c == ':') { if (IS_BEG() || lex_state == EXPR_CLASS || (IS_ARG() && space_seen)) { lex_state = EXPR_BEG; return tCOLON3; } lex_state = EXPR_DOT; return tCOLON2; }
  21. 21. ... if (lex_state == EXPR_END || lex_state == EXPR_ENDARG || (c != -1 && ISSPACE(c))) { pushback(c); lex_state = EXPR_BEG; return ':'; } switch (c) { case ''': lex_strterm = NEW_STRTERM(str_ssym, c, 0); break; case '"': lex_strterm = NEW_STRTERM(str_dsym, c, 0); break; default: pushback(c); break; } lex_state = EXPR_FNAME; return tSYMBEG;
  22. 22. How does parser deal with colon? (summary) •:: → tCOLON2 or tCOLON3 •EXPR_END or →: (else) •otherwise → tSYMBEG •:' → str_ssym •:" → str_dsym
  23. 23. So, •:-) → tASSOC •:: → tCOLON2 or tCOLON3 •EXPR_END or →: (else) •otherwise → tSYMBEG •:' → str_ssym •:" → str_dsym
  24. 24. :-)
  25. 25. DISCLAIMER •This presentation is not for super rubyists or ruby committers, but for ordinary programmers.
  26. 26. Case 2: Lisp Like Symbol •Symbol Literal :vancouver 'vancouver •Ad-hoc p :a, :b p 'a, 'b
  27. 27. Single Quote (in parser_yylex) ... case ''': lex_strterm = NEW_STRTERM(str_squote, ''', 0); return tSTRING_BEG; ...
  28. 28. Single Quote (in parser_yylex) ... case ''': if (??? condition ???) { lex_state = EXPR_FNAME; return tSYMBEG; } lex_strterm = NEW_STRTERM(str_squote, ''', 0); return tSTRING_BEG; ...
  29. 29. (loop (lambda (p 'good)))
  30. 30. Case3: Pre Incremental Operator •++i •i = i.succ (NOT i = i + 1)
  31. 31. Lexer @@ -685,6 +685,7 @@ static void token_info_pop(struct parser_params*, const char *token); %type <val> program reswords then do dot_or_colon %*/ %token tUPLUS /* unary+ */ +%token tINCR /* ++var */ %token tUMINUS /* unary- */ %token tPOW /* ** */ %token tCMP /* <=> */ (Actually there are more trivial fixes)
  32. 32. regenerate id.h •id.h is automatically generated by parse.y in make •$ rm id.h $ make
  33. 33. parser example variable : tIDENTIFIER | tIVAR | tGVAR | tCONSTANT | tCVAR | keyword_nil {ifndef_ripper($$ = keyword_nil);} | keyword_self {ifndef_ripper($$ = keyword_self);} | keyword_true {ifndef_ripper($$ = keyword_true);} | keyword_false {ifndef_ripper($$ = keyword_false);} | keyword__FILE__ {ifndef_ripper($$ = keyword__FILE__);} | keyword__LINE__ {ifndef_ripper($$ = keyword__LINE__);} | keyword__ENCODING__ {ifndef_ripper($$ = keyword__ENCODING_ ;
  34. 34. lhs : variable { /*%%%*/ if (!($$ = assignable($1, 0))) $$ = NEW_BEGIN(0); /*% $$ = dispatch1(var_field, $1); %*/ } | primary_value '[' opt_call_args rbracket { /*%%%*/ $$ = aryset($1, $3); /*% $$ = dispatch2(aref_field, $1, escape_Qundef($3)); %*/ } ...
  35. 35. BNF (part) program : compstmt compstmt : stmts opt_terms stmts : none | stmt | stmts terms stmt stmt : kALIAS fitem fitem | kALIAS tGVAR tGVAR : : | expr expr : kRETURN call_args | kBREAK call_args : : | '!' command_call | arg arg : lhs '=' arg | var_lhs tOP_ASGN arg | primary_value '[' aref_args ']' tOP : : | arg '?' arg ':' arg | primary primary : literal | strings : : | tLPAREN_ARG expr ')' | tLPAREN compstmt ')' : : | kREDO | kRETRY
  36. 36. Assign stmt : ... | mlhs '=' command_call { /*%%%*/ value_expr($3); $1->nd_value = $3; $$ = $1; /*% $$ = dispatch2(massign, $1, $3); %*/ }
  37. 37. mlhs mlhs: mlhs_basic | ... mlhs_basic: mlhs_head | ... mlhs_head: mlhs_item ',' | ... mlhs_item: mlhs_node | ... mlhs_node: variable { $$ = assignable($1, 0); }
  38. 38. Method call block_command : block_call | block_call '.' operation2 command_args { /*%%%*/ $$ = NEW_CALL($1, $3, $4); /*% $$ = dispatch3(call, $1, ripper_id2sym('.'), $$ = method_arg($$, $4); %*/ }
  39. 39. Mix! var_ref: ... | tINCR variable { /*%%%*/ $$ = assignable($2, 0); $$->nd_value = NEW_CALL(gettable($$->nd_vid), rb_intern("succ"), 0); /*% $$ = dispatch2(unary, ripper_intern("++@"), $2); %*/ }
  40. 40. ++ruby
  41. 41. Case 4: def A#b •A#b instance method b of class A •A.b class method b of class A
  42. 42. A#b class A def b ... end end def A.b ... end
  43. 43. A#b def A#b ... end def A.b ... end
  44. 44. # (in parser_yylex) case '#': /* it's a comment */ /* no magic_comment in shebang line */ if (!parser_magic_comment(parser, lex_p, lex_pend - lex_p)) { if (comment_at_top(parser)) { set_file_encoding(parser, lex_p, lex_pend); } } lex_p = lex_pend;
  45. 45. # (in parser_yylex) case '#': /* it's a comment */ c = nextc(); pushback(c); if(lex_state == EXPR_END && ISALNUM(c)) return '#'; /* no magic_comment in shebang line */ if (!parser_magic_comment(parser, lex_p, lex_pend - lex_p)) { if (comment_at_top(parser)) { set_file_encoding(parser, lex_p, lex_pend);
  46. 46. Primary primary: literal | ... | k_def singleton dot_or_colon {lex_state = EXPR_FNAME;} fname { in_single++; lex_state = EXPR_END; /* force for args */ /*%%%*/ local_push(0); /*% %*/ } f_arglist bodystmt k_end { /*%%%*/ NODE *body = remove_begin($8); reduce_nodes(&body); $$ = NEW_DEFS($2, $5, $7, body); fixpos($$, $2); local_pop(); /*% $$ = dispatch5(defs, $2, $3, $5, $7, $8); %*/ in_single--; }
  47. 47. | k_def cname '#' {lex_state = EXPR_FNAME;} fname { $<id>$ = cur_mid; cur_mid = $5; in_def++; /*%%%*/ local_push(0); /*% %*/ } f_arglist bodystmt k_end { /*%%%*/ NODE *body = remove_begin($8); reduce_nodes(&body); $$ = NEW_DEFN($5, $7, body, NOEX_PRIVATE); fixpos($$, $7); fixpos($$->nd_defn, $7); $$ = NEW_CLASS(NEW_COLON3($2), $$, 0); nd_set_line($$, $<num>6); local_pop(); /*% $$ = dispatch4(defi, $2, $5, $7, $8); %*/ in_def--; cur_mid = $<id>6; }
  48. 48. Reference Rubyソースコード完全解説 青木峰郎 著、まつもとゆ きひろ 監修 Minero AOKI,Yukihiro MATSUMOTO "Ruby Hacking Guide" HTMLVersion is available
  49. 49. Reference •My blog http://ujihisa.blogspot.com •All patches I showed are there
  50. 50. end
  51. 51. Appendix: Imaginary Numbers •Matz wrote a patch in [ruby-dev:38843] •translation: [ruby-core:24730] •It won't be accepted
  52. 52. Appendix: Imaginary Numbers > 3i => (0 + 3i) > 3i.class => Complex
  53. 53. Appendix •{you <3 ruby} •f(x, y) = z (like f[x, y] = z as f.[]=(x, y, z)) •Annotations!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×