• Save
Introduction of bison
Upcoming SlideShare
Loading in...5
×
 

Introduction of bison

on

  • 58,249 views

 

Statistics

Views

Total Views
58,249
Views on SlideShare
3,069
Embed Views
55,180

Actions

Likes
3
Downloads
0
Comments
0

20 Embeds 55,180

http://www.taobaodba.com 55077
http://www.yybean.com 41
http://cache.baidu.com 35
http://www.taobaodba.com HTTP 5
http://wse1.baidu.com 5
http://www.sogou.com 3
http://www.taobaodba.com&_=1331551457418 HTTP 1
http://www.taobaodba.com&_=1334328053485 HTTP 1
http://www.taobaodba.com&_=1336366836937 HTTP 1
http://www.taobaodba.com&_=1338972909359 HTTP 1
http://www.xfpcxlt.tcnldw.org 1
http://www.taobaodba.com&_=1329387886516 HTTP 1
http://www.taobaodba.com&_=1329387854609 HTTP 1
http://webcache.googleusercontent.com 1
http://www.taobaodba.com&_=1323867442078 HTTP 1
http://www.taobaodba.com&_=1323763378282 HTTP 1
http://www.taobaodba.com} {1012870894|||pingback 1
http://www.taobaodba.com} {961720073|||pingback 1
http://www.taobaodba.com} {956389821|||pingback 1
https://learn-future.csuchico.edu 1
More...

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Introduction of bison Introduction of bison Presentation Transcript

    • Introducing of Bison
      天官
      2011-09-15
      1
    • Flex and Bison
      statement: NAME '=' expression
      expression: NUMBER '+' NUMBER
      | NUMBER '-' NUMBER
      Flex:
      recognizes regular expressions.
      divides the input stream into pieces(token)
      terminal symbol:
      Symbols produced by the lexer are called terminal symbols or tokens
      nonterminal symbol:
      Those that are defined on the left-hand side of rules are called nonterminal symbols or nonterminals.
      VS
      Bison
      • for building programs that handle structure input.
      • takes these pieces and groups them together logically.
    • Shift/Reduce Parsing
      Shift
      As the parser reads tokens, each time it reads a token that doesn't complete a rule, it pushes the token on an internal stack and switchs to a new state reflecting the token it just read. This action is called a shift.
      Reduce
      When it has found all the symbols that constitute the right-hand side of a rule, it pops the right-hand side symbols off the stack, pushes the left-hand side symbol onto the stack, and switches to a new state reflecting the new symbol on the stack. This action is called a reduction.
    • Parsing methods
      Bison parsers can use either of two parsing methods, known as LALR(1) and GLR
      LALR(1) (Look Ahead Left to Right with a one-token lookahead), which is less powerful but considerably faster and easier to use than GLR.
      GLR (Generalized Left to Right).
      The most common kind of language that computer parsers handle is a context-free grammar(CFG)
      The standard form to write down a CFG is Baskus-Naur Form (BNF)
    • LR parser
      LR parser is a parser that reads input from Left to right and produces a Rightmost derivation.
      The term LR(k) parser is also used; where the k refers to the number of unconsumed "look ahead" input symbols that are used in making parsing decisions.
      Usually k is 1 and the term LR parser is often intended to refer to this case. (LALR(1))
    • look ahead
      LALR(1) cannot deal with grammars that need more than one token of lookahead to tell whether it has matched a rule.
      phrase: cart_animal AND CART
      | work_animal AND PLOW
      cart_animal: HORSE | GOAT
      work_animal: HORSE | OX
      phrase: cart_animalCART
      | work_animalPLOW
      cart_animal: HORSE | GOAT
      work_animal: HORSE | OX
      Not support!
      OR
      phrase: cart_animal AND CART
      | work_animal AND PLOW
      cart_animal: HORSE | GOAT
      work_animal: OX
    • Rightmost Derivation
      Rule 1
      expr  expr – digit
      exprexpr – digit
      exprexpr + digit
      expr digit
      digit 0|1|2|…|9
      Example input:
      3 + 8 - 2
      The rightmost non-terminal is replaced in each step
      Rule 4
      expr – digit  expr – 2
      Rule 2
      expr – 2 expr + digit - 2
      Rule 4
      expr + digit - 2  expr + 8-2
      Rule 3
      expr + 8-2 digit + 8-2
      Rule 4
      digit + 8-23+8 -2
    • Leftmost Derivation
      Rule 1
      expr  expr – digit
      The leftmost non-terminal is replaced in each step
      expr
      1
      1
      Rule 2
      expr – digit  expr + digit – digit
      2
      2
      3
      expr
      -
      digit
      Rule 3
      expr + digit – digit  digit + digit – digit
      3
      5
      4
      expr
      digit
      +
      Rule 4
      4
      digit + digit – digit3 + digit – digit
      2
      Rule 4
      3 + digit – digit 3 + 8 – digit
      5
      6
      digit
      8
      Rule 4
      3 + 8 – digit 3 + 8 – 2
      6
      3
    • Leftmost Derivation
      Rule 1
      expr  expr – digit
      expr  expr – digit
      expr  expr + digit
      expr  digit
      digit 0|1|2|…|9
      Example input:
      3 + 8 - 2
      The leftmost non-terminal is replaced in each step
      Rule 2
      expr – digit  expr + digit – digit
      Rule 3
      expr + digit – digit  digit + digit – digit
      Rule 4
      digit + digit – digit3 + digit – digit
      Rule 4
      3 + digit – digit 3 + 8 – digit
      Rule 4
      3 + 8 – digit 3 + 8 – 2
    • Leftmost Derivation
      Rule 1
      expr  expr – digit
      The leftmost non-terminal is replaced in each step
      expr
      1
      1
      Rule 2
      expr – digit  expr + digit – digit
      6
      2
      2
      expr
      -
      digit
      Rule 3
      expr + digit – digit  digit + digit – digit
      3
      3
      5
      expr
      digit
      +
      Rule 4
      4
      digit + digit – digit3 + digit – digit
      2
      Rule 4
      3 + digit – digit 3 + 8 – digit
      5
      4
      digit
      8
      Rule 4
      3 + 8 – digit 3 + 8 – 2
      6
      3
    • Context-Free Grammars
      A context-free grammar G is defined by the 4-tuple:
      G = (V, ∑, R, S) where
      V is a finite set; each element v ϵ V is called a non-terminal character or a variable. Each variable represents a different type of phrase or clause in the sentence. Variables are also sometimes called syntactic categories. Each variable defines a sub-language of the language defined by .
      ∑ is a finite set of terminals, disjoint from V, which make up the actual content of the sentence. The set of terminals is the alphabet of the language defined by the grammar G.
      R is a finite relation from V to (V U ∑)*. The members of R are called the (rewrite) rules or productions of the grammar.
      S is the start variable (or start symbol), used to represent the whole sentence (or program). It must be an element of V.
      The asterisk represents the Kleene star operation.
    • Context-free language
      The language of grammar G = (V, ∑, R, S) is the set
      L(G) = { ωϵ ∑* : S ωω }
      A language L is said to be context-free languange(CFL), if there exists a CFG G, such that L = L(G).
    • Context-Free Grammars
      Comprised of
      A set of tokens or terminal symbols
      A set of non-terminal symbols
      A set of rules or productions which express the legal relationships between symbols
      A start or goal symbol
      Example:
      exprexpr – digit
      exprexpr + digit
      expr digit
      digit 0|1|2|…|9
      • Tokens: -,+,0,1,2,…,9
      • Non-terminals: expr, digit
      • Start symbol: expr
    • A Bison Parser
      A bison specification has the same three-part structure as a flex specification.
      ... definition section ...
      %%
      ... rules section ...
      %% a bison example
      ... user subroutines ...
      The first section, the definition section, handles control information for the parser and generally sets up the execution environment in which the parser will operate.
      The second section contains the rules for the parser.
      The third section is C code copied verbatim into the generated C program.
    • Terms
      Symbols are strings of letters, digits, periods, and underscores that do not start with a digit.
      error is reserved for error recovery.
      Do not use C reserved words or bison's own symbols such as yyparse.
      Symbols produced by the lexer are called terminal symbols or tokens
      Those that are defined on the left-hand side of rules are called nonterminal symbols or nonterminals.
    • Structure of a Bison Specification
      ... definition section ...
      %%
      ... rules section ...
      %%
      ... user subroutines ...
    • Literal Block
      %{
      ... C code and declarations ...
      %}
      The contents of the literal block are copied verbatim to the generated C source file near the beginning, before the beginning of yypare().
      Usually contains declarations of variables and functions, as well as #include.
      Bison also provides an experimental %code POS { ... } where POS is a keyword to suggest where in the generated parser the code should go.
    • Delaration
      %parse-param
      %require "2.4“
      declare the minimum version of bison needed to compile it
      %start
      identifies the top-level rule (Named the first rule.)
      %union
      %token
      %type
      %left
      %right
      %nonassoc
      %expect
    • Token
      Define the ternimators.
      Bison treats a character in single quotes as a token
      Bison also allows you to decalre strings as aliases for tokens
      This defines the token NE and lets you use NE and != interchangeably in the parser. The lexer must still return the internal token values for NE when the token is read, not a string.
      expr: '(' expr ')';
      %token NE "!="
      %%
      ...
      expr: expr "!=" exp;
    • Parse-param
      Normally, you call yyparse() with no arguments, if you need, youcan add parameters to its definition:
      %parse-param { char *modulename }
      %parse-param { int intensity }
      This allows you to call yyparse("mymodule", 42)
    • Type
      The %union declaration specifies the entire list of possible types
      %token is used for declaring token types
      %type is used for declaring nonterminal symbols
      %{
      #include "calc.h“ /* Contains definition of `symrec' */
      %}
      %union {
      double val; /* For returning numbers. */
      symrec *tptr; /* For returning symbol-table pointers */
      }
      %token <tptr> VAR FNCT /* Variable and Function */
      %type <val> exp
      %%
    • Structure of a Bison Specification
      ... definition section ...
      %%
      ... rules section ...
      %%
      ... user subroutines ...
    • Actions
      An action is C code executed when bison matches a rule in the grammar.
      The action can refer to the values associated with the symbols in the rule by using a dollar sign followed by a number.
      The name $$ refers to the value for the left-hand side (LHS) symbol.
      For rules with no action, bison uses a default of the following
      date: month '/' day '/' year { printf("date %d-%d-%d found", $1, $3, $5); } ;
      { $$ = $1; }
    • Rules
      Recursive Rules
      The action can refer to the values associated with the symbols in the rule by using a dollar sign followed by a number.
      In most cases, Bison handles left recursion much more efficiently than right recursion.
      numberlist : /* empty */
      | numberlist NUMBER
      ;
      exprlist: exprlist ',' expr; /* left recursion */
      or
      exprlist: expr ',' exprlist; /* right recursion */
    • Special Characters
      % All of the declarations in the definition section start with %.
      $ In actions, a dollar sign introduces a value reference.
      @ In actions, an @ sign introduces a location reference, such as @2 for the location of the second symbol in the RHS.
      ' Literal tokens are enclosed in single quotes.
      " Bison lets you declare quoted string as parser alias for tokens.
      <> In a value reference in an action, you can override the value's default type by enclosing the type name in angle brackets.
      {} The C code in actions is enclosed in curly braces.
      ; Each rule in the rules section should end with a semicolon.
      | or syntax for multi-rules with same LHS.
      : separate left-hand side and right-hand side
      - Symbols may include underscores along with letters, digits, and periods.
      . Symbols may include periods along with letters, digits, and underscores.
    • Reserved
      YYABORT
      In an action makes the parser routine yyparse() return immediately with a nonzero value, indicating failure.
      YYACCEPT
      In an action makes the parser routine yyparse() return immediately with a value 0, indicating success.
      YYBACKUP
      The macro YYBACKUP lets you unshift the current token and replace it with something else.
      sym: TOKEN { YYBACKUP(newtok, newval); }
      It is extremely difficult to use YYBACKUP() correctly, so you're best off not using it.
    • Reserved
      yyclearin
      The macro yyclearin in an action discards a lookahead token if one has been read. It is most oftern useful in error recovery in an interactive parser to put the paarser into a known state after an error:
      YYDEBUG
      To include the trace code, either use the -t flag on the bison command line or else define the C preprocessor symbol YYDEBUG to be nonzero either on the C compiler command line or by inlcuding something like this in the definition section:
      stmtlist : stmt | stmtlist stmt;
      stmt : error { reset_input(); yyclearin; };
      %{
      #define YYDEBUG 1
      %}
    • Ambiguity and Conflicts
      The grammar is truly ambiguous
      Shift/Reduce Conflicts
      Reduce/Reduce Conflicts
      The grammar is unambiguous, but the standard parsing technique that bison uses is not powerful enough to parse the grammar. (need to look more than one token ahead)
      We have already told about it of LALR(1).
    • Reduce/Reduce Conflicts
      A reduce/reduce conflict occurs when the same token could complete two different rules.
      %%
      prog: proga | progb;
      proga: 'X';
      progb: 'X';
    • Shift/Reduce Conflicts
      %type <a> exp
      ...
      %%
      ...
      expr: expr '+' exp
      { $$ = newast('+', $1, $3); }
      | expr '-' exp
      { $$ = newast('-', $1, $3); }
      | expr '*' exp
      { $$ = newast('*', $1, $3); }
      | expr '/' exp
      { $$ = newast('/', $1, $3); }
      | '|' exp
      { $$ = newast('|', $2, NULL); }
      | '(' exp ')'
      { $$ = $2); }
      | '-' exp
      { $$ = newast('M', $2, NULL); }
      | NUMBER { $$ = newnum($1); }
      ;
      %%
      Example 2+3*4
    • Problem
      At this point, the parser looks at the * and could either reduce 2+3 using;
      to an expression or shift the *, expecting to be able to reduce:
      later on.
      2 shift NUMBER
      E reduce E->NUMBER
      E + shift +
      E + 3 shift NUMBER
      E + E reduce E->NUMBER
      Example 2+3 * 4
      expr: expr '+' exp
      expr: expr ‘*' exp
    • Analysis
      The problem is that we haven't told bison about the precedence and associativity of the operators.
      Precedence controls which operators execute first in an expression.
      In and expression grammar, operators are grouped into levels of precedence from lowest to highest.The total number of levels depends on the language. The C language is notorious for having too many precedence levels, a total of 15 levels.
      Associativity controls the grouping of operators at the same precedence level.
    • Implicitly Solution
      %type <a> exp exp1 exp2
      ...
      %%
      ...
      expr : expr1 '+' exp1 { $$ = newast('+', $1, $3); }
      | expr1 '-' exp1 { $$ = newast('-', $1, $3); }
      | expr1 { $$ = $1; }
      expr1: expr2 '*' exp2 { $$ = newast('*', $1, $3); }
      | expr2 '/' exp2 { $$ = newast('/', $1, $3); }
      | expr2 { $$ = $1; }
      expr2: '|' exp { $$ = newast('|', $2, NULL); }
      | '(' exp ')' { $$ = $2); }
      | '-' exp { $$ = newast('M', $2, NULL); }
      | NUMBER { $$ = newnum($1); }
      ;
      %%
    • Explicitly Solution
      %left '+' '-’
      %left '*' '/’
      %nonassoc '|' NMINUS
      %type <a> exp exp1 exp2
      ...
      %%
      ...
      expr: expr '+' exp { $$ = newast('+', $1, $3); }
      | expr '-' exp { $$ = newast('-', $1, $3); }
      | expr '*' exp { $$ = newast('*', $1, $3); }
      | expr '/' exp { $$ = newast('/', $1, $3); }
      | '|' exp { $$ = newast('|', $2, NULL); }
      | '(' exp ')' { $$ = $2); }
      | '-' exp %prec UMINUS { $$ = newast('M', $2, NULL); }
      | NUMBER { $$ = newnum($1); }
      ;
      %%
    • Explicitly Solution
      %left, %right, and %nonassoc declarations defining the order of precedence from lowest to highest.
      %left, left associative
      %right, right associative
      %nonaccoc, no associativity
      UMINUS, pseudo token standing fro unary minus
      %prec UMINUS, %prec tells bison to use the precedence of UMINUS for this rule.
    • IF/THEN/ELSE conflict
      When Not to Use Precedence Rules
      In expression grammars and to resolve the "dangling else" conflict in grammars for if/then/else language constructs, it is easy to understand.
      But in other situations, it can be extremely difficult to understand.
      stmt: IF '(' cond ')' stmt
      | IF '(' cond ')' stmt ELSE stmt
      | TERMINAL
      cond: TERMINAL
      Ambiguous!!!
      IF ( cond ) IF ( cond ) stmt ELSE stmt
      Which one?
      IF ( cond ) { IF ( cond ) stmt } ELSE stmt
      IF ( cond ) { IF ( cond ) stmt ELSE stmt }
    • Implicitly Solution
      stmt :matched
      | unmatched
      ;
      matched :other_stmt
      | IF expr THEN matched ELSE matched
      ;
      unmatched : IF expr THEN stmt
      | IF expr THEN matched ELSE unmatched
      ;
      other_stmt: /* rules for other kinds of statement */
      ...
      IF ( cond ) { IF ( cond ) stmt ELSE stmt }
    • Explicitly Solution
      %nonassoc THEN
      %nonassoc ELSE
      %%
      stmt : IF expr THEN stmt
      | IF expr stmt ELSE stmt
      ;
      Equal to:
      %nonassoc LOWER_THAN_ELSE
      %nonassoc ELSE
      %%
      stmt : IF expr stmt %prec LOWER_THAN_ELSE
      | IF expr stmt ELSE stmt
      ;
      IF ( cond ) { IF ( cond ) stmt ELSE stmt }
    • expect
      Occasionally you may have a grammar that has a few conflicts, you are confident that bison will resolve them the way you want, and it's too much hassle to rewrite the grammar to get rid of them.
      %expect N tells bison that your parser should have N shift/reduce conflicts.
      %expect-rr N to tell it how many reduce/reduce conflicts to expect.
    • Common Bugs In Bison Programs
      Infinite Recursion
      %%
      xlist: xlist ‘X’ ;
      should be ==>
      %%
      xlist : 'X'
      | xlist 'X’
      ;
    • Common Bugs In Bison Programs
      Interchanging Precedence
      %token NUMBER
      %left PLUS
      %left MUL
      %%
      expr : expr PLUS expr %prec MUL
      | expr MUL expr %prec PLUS
      | NUMBER
      ;
    • Lexical Feedback
      Parsers can sometimes feed information back to the lexer to handle otherwise difficult situations.
      E.g. syntax like this:
      message ( any characters )
      /* parser */
      %{
      init parenstring = 0;
      }%
      ...
      %%
      statement: MESSAGE { parenstring = 1; } '(' STRING ')';
    • Lexical Feedback
      /* lexer */
      %{
      extern int parenstring;
      %}
      %s PSTRING
      %%
      "message" return MESSAGE;
      "(" {
      if(parenstring) BEGIN PSTRING;
      return '(';
      }
      <PSTRING>[^)]* {
      yylval.svalue = strdup(yytext);
      BEGIN INITIAL;
      return STRING;
      }
    • Structure of a Bison Specification
      ... definition section ...
      %%
      ... rules section ...
      %%
      ... user subroutines ...
    • User subroutines Section
      This section typically includes routines called from the actions.
      Nothing special.
    • Discuss Everything