7 compiler lab


Published on

Writing Bison file for Syntax Analysis,Combining it with a Flex file and Compilinit

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • If a symbol neither is a token nor appears on the left side of a rule, it’s like an unreferenced variable in a C program. It doesn’t hurt anything, but it probably means the programmer made a mistake.
  • 7 compiler lab

    2. 2. TOKEN VALUES • When a flex scanner returns a stream of tokens, each token actually has two parts: • the token • and the token‟s value • The token is a small integer. • The token numbers are arbitrary, except that token zero always means end-of-file. • When bison creates a parser: • Bison assigns the token numbers automatically starting at 258 • And creates a .h with definitions of the tokens numbers. Department of Computer Science -14-18/4/12 2 Compiler Engineering Lab
    3. 3. REMINDER • The . pattern is the catchall to match anything the other patterns didn‟t. • Any quoted strings tell flex to use the strings as is, rather than interpreting them as regular expressions. • Ex. “+” {printf (“PLUS“);} • Important Rule: • In a flex file, • If an action code returns  scanning resumes on the next call to yylex(); • if it doesn‟t return  scanning resumes immediately. Department of Computer Science -14-18/4/12 3 Compiler Engineering Lab
    4. 4. YYLVAL • YYlval is a variable that stores the token value. • The value is usually defined as a UNION  so that different kinds of tokens will have different types of values (ex. Integer value, floating point value, or a pointer to an entry in the symbol table). Department of Computer Science -14-18/4/12 4 Compiler Engineering Lab
    5. 5. YYWRAP() 1. YYWrap Option: When the scanner receives an end-of-file indication from YY_INPUT, it then checks the `yywrap() function: • If `yywrap() returns false (zero)  it is assumed that the function has gone ahead and set up yyin to point to another input file, and scanning continues. • If it returns true (non-zero) the scanner terminates, returning 0 to its caller. Note that in either case, the start condition remains unchanged; it does not revert to INITIAL. • If you do not supply your own version of `yywrap()„: • then you must either use `%option noyywrap (in which case the scanner behaves as though `yywrap() returned 1), • or you must link with `-lfl to obtain the default version of the routine, which always returns 1. Department of Computer Science -14-18/4/12 5 Compiler Engineering Lab
    6. 6. BISON PARSING • Any bison parser makes a parse tree as it parses its input: • In some applications, it creates the tree as a data structure in memory for later use. • In others, the tree is just implicit in the sequence of operations the parser does. • In order to write a parser, we need some way to describe the rules the parser uses to turn a sequence of tokens into a parse tree. • The most common kind of language that computer parsers handle is a context-free grammar (CFG). Department of Computer Science -14-18/4/12 6 Compiler Engineering Lab
    7. 7. GRAMMARS & PARSING • The parser‟s job is to figure out the relationship among the input tokens. • A common way to display such relationships is a parse tree. (Ex. arithmetic expression 1 * 2 + 3 * 4 + 5) Department of Computer Science -14-18/4/12 7 Compiler Engineering Lab
    8. 8. GRAMMARS & PARSING • Multiplication has higher precedence than addition, so : • the first two expressions are 1 * 2 and 3 * 4. • Then those two expressions are added together, and that sum is then added to 5. Department of Computer Science -14-18/4/12 8 Compiler Engineering Lab
    9. 9. GRAMMARS & PARSING • BNF for simple arithmetic expressions : • <exp> ::= <factor> | <exp> + <factor> • <factor> ::= NUMBER | <factor> * NUMBER Department of Computer Science -14-18/4/12 9 Compiler Engineering Lab
    10. 10. GRAMMARS & PARSING • Each line is a rule that says how to create a branch of the parse tree. • In BNF, ::= can be read “is a” or “becomes” • and | is “or,” • The name on the left side of a rule is a symbol or term. • By convention, all tokens are considered to be symbols, but there are also symbols that are not tokens. • Useful BNF is invariably quite recursive, with rules that refer to themselves directly or indirectly. Department of Computer Science -14-18/4/12 10 Compiler Engineering Lab
    11. 11. BISON’S RULE INPUT LANGUAGE • Bison rules are basically BNF, with the punctuation simplified a little to make them easier to type. • Bison programs have three-part structure as flex programs, with declarations, rules, and C code. 1. The Declarations Section: • Includes C code to be copied to the beginning of the generated C parser, again enclosed in %{ and %}. • Following that are %token token declarations, telling bison the names of the symbols in the parser that are tokens. • By convention, tokens have uppercase names, although bison doesn‟t require it. • Any symbols not declared as tokens have to appear on the left side of at least one rule in the program Department of Computer Science -14-18/4/12 11 Compiler Engineering Lab
    12. 12. BISON’S RULE INPUT LANGUAGE 2. The Rules Section: • Includes rules in simplified BNF • Bison uses a single colon : rather than ::= • A semicolon marks the end of a rule. • like flex, the C action code goes in braces at the end of each rule. • The symbol on the left side of the first rule is the start symbol, the one that the entire input has to match. • There can be, and usually are, other rules with the same start symbol on the left. • Note: • Bison automatically does the parsing for you, remembering what rules have been matched, so the action code maintains the values associated with each symbol. • Bison parsers also perform side effects such as creating data structures for later use or, for example printing out results. Department of Computer Science -14-18/4/12 12 Compiler Engineering Lab
    13. 13. BISON’S RULE INPUT LANGUAGE 2. The Rules Section: • Each symbol in a bison rule has a value; the value of the target symbol (the one to the left of the colon) is called $$ in the action code • and the values on the right are numbered $1, $2, and so forth, up to the number of symbols in the rule. • The values of tokens are whatever was in yylval when the scanner returned the token Department of Computer Science -14-18/4/12 13 Compiler Engineering Lab
    14. 14. EXAMPLE #1: BISON CALCUALTOR.Y FILE • Example # 1: Write Bison code, including the BNF, for the first version of our calculator. • Ensure to handle errors in case of syntax-error appears. Department of Computer Science -14-18/4/12 14 Compiler Engineering Lab
    15. 15. /* simplest version of calculator */%{ EXAMPLE#1: SIMPLE#include <stdio.h> CALCULATOR BISON FILE%} /* declare tokens */%token NUMBER%token ADD SUB MUL DIV ABS%token EOL%%calclist: /* nothing */| calclist exp EOL { printf("= %dn", $2); } EOL is end of an expression ;exp: factor default $$ = $1| exp ADD factor { $$ = $1 + $3; } | exp SUB factor { $$ = $1 - $3; } ;factor: term default $$ = $1| factor MUL term { $$ = $1 * $3; } | factor DIV term { $$ = $1 / $3; } ;term: NUMBER default $$ = $1| ABS term { $$ = $2 >= 0? $2 : - $2; } ;%%main(int argc, char **argv){….. Department of Computer Science -14-18/4/12 15 Compiler Engineering Lab
    16. 16. /* simplest version of calculator */ EXAMPLE#1: SIMPLE….. CALCULATOR BISON FILE%% (CONTINUED)….%%main(int argc, char **argv){yyparse();}yyerror(char *s){printf(stderr, "error: %sn", s);} Department of Computer Science -14-18/4/12 16 Compiler Engineering Lab
    17. 17. EXERCISES : BISON.Y FILE(S) • E2: Alter example#1 to accept one-line comment (ex. // this is a comment ) • "//".* /* ignore comments */ • E3: Allow the calculator to recognize hexadecimal numbers • 0x[a-f0-9]+ • Use strtol function (Stdlib.h) to convert the string in c to long number • E3: Allow the calculator to understand the mod (%) operator - a%b where a and b are integers gives you the remainder when a is divided by b. Department of Computer Science -14-18/4/12 17 Compiler Engineering Lab
    18. 18. COMPILING FLEX AND BISON PROGRAMS TOGETHER 1. Include a header file that bison will create for us, which includes both definitions of the token numbers and a definition of yylval. 2. Include Bison‟s header file in Flex‟s file. %{ # include ”cal.tab.h" %} %% same rules as before, and no code in the third section 3. Delete the testing main routine in the third section of the scanner, since the parser will now call the scanner. Department of Computer Science -14-18/4/12 18 Compiler Engineering Lab
    19. 19. EXAMPLE #1 : FLEX AND BISON TOGETHER FOR CALCULATOR • The build process is now complex enough to be worth putting into a Makefile: • bison -d cal.y flex cal.l cc cal.tab.c lex.yy.c -lfl • Explanation: • First it runs bison with the -d (for “definitions” file) flag, which creates cal.tab.c and cal.tab.h, • and it runs flex to create lex.yy.c. • Then it compiles them together, along with the flex library. Department of Computer Science -14-18/4/12 19 Compiler Engineering Lab
    20. 20. EXAMPLE #1 : FLEX AND BISON TOGETHER FOR CALCULATOR • In this parser, the first two rules, which define the symbol calcset, implement a loop that reads an expression terminated by a newline and prints its value. • The definition of calclist uses a common two-rule recursive idiom to implement a sequence or list: 1. the first rule is empty and matches nothing; 2. the second adds an item to the list. The action in the second rule prints the value of the exp in $2. • In the absence of an explicit action on a rule, the parser assigns $1 to $$. Department of Computer Science -14-18/4/12 20 Compiler Engineering Lab
    21. 21. AMBIGUOUS GRAMMARS • Why shouldn‟t we just write ? • exp: exp ADD exp | exp SUB exp | exp MUL exp | exp DIV exp | ABS exp | NUMBER ; There are two answers: Precedence and Ambiguity Department of Computer Science -14-18/4/12 21 Compiler Engineering Lab
    22. 22. AMBIGUOUS GRAMMARS • The separate symbols for term, factor, and exp tell bison to handle ABS, then MUL and DIV, then ADD and SUB. • In general, whenever a grammar has multiple levels of precedence where one operator binds “tighter” than another. Department of Computer Science -14-18/4/12 22 Compiler Engineering Lab
    23. 23. AMBIGUOUS GRAMMARS • What about this? exp: exp ADD exp | exp SUB exp | factor ; • So, the example mentioned here is ambiguous as the expression 1 + 2 – 3 could be parsed (1+2)-3 or 1+(2-3)  two different expression with two different values • Bison will not parse an ambiguous grammar and report conflicts  Ambiguity is an error • That is, any parse that Bison creates has exactly only one way to parse any input Department of Computer Science -14-18/4/12 23 Compiler Engineering Lab
    24. 24. SEMANTIC ANALYSIS (PARSING) • Validate expressions and taking appropriate actions • Generate code • Checks the source program for semantic errors • Gathers type information for the subsequent code- generation phase. • It uses the hierarchical structure determine by the Syntax-Analysis phase  to identify operators + operands of expressions and statements • Important Component: Type Checking • Ex. Using Real number as an index for an Array • If an Arithmetic operation is applied to two numbers, one is Real and the other is Integer  type conversion must be applied (intToReal) by adding an extra node Department of Computer Science -14-18/4/12 24 Compiler Engineering Lab
    25. 25. QUESTIONS? Thank you for listening  Department of Computer Science -14-18/4/12 25 Compiler Engineering Lab