SlideShare a Scribd company logo
PLLab, NTHU,Cs2403 Programming
Languages
1
Lex Yacc tutorial
Kun-Yuan Hsieh
kyshieh@pllab.cs.nthu.edu.tw
Programming Language Lab., NTHU
PLLab, NTHU,Cs2403 Programming
Languages
2
Overview
take a glance at Lex!
PLLab, NTHU,Cs2403 Programming
Languages
3
Compilation Sequence
PLLab, NTHU,Cs2403 Programming
Languages
4
What is Lex?
• The main job of a lexical analyzer
(scanner) is to break up an input stream
into more usable elements (tokens)
a = b + c * d;
ID ASSIGN ID PLUS ID MULT ID SEMI
• Lex is an utility to help you rapidly
generate your scanners
PLLab, NTHU,Cs2403 Programming
Languages
5
Lex – Lexical Analyzer
• Lexical analyzers tokenize input streams
• Tokens are the terminals of a language
– English
• words, punctuation marks, …
– Programming language
• Identifiers, operators, keywords, …
• Regular expressions define
terminals/tokens
PLLab, NTHU,Cs2403 Programming
Languages
6
Lex Source Program
• Lex source is a table of
– regular expressions and
– corresponding program fragments
digit [0-9]
letter [a-zA-Z]
%%
{letter}({letter}|{digit})* printf(“id: %sn”, yytext);
n printf(“new linen”);
%%
main() {
yylex();
}
PLLab, NTHU,Cs2403 Programming
Languages
7
Lex Source to C Program
• The table is translated to a C program
(lex.yy.c) which
– reads an input stream
– partitioning the input into strings which
match the given expressions and
– copying it to an output stream if necessary
PLLab, NTHU,Cs2403 Programming
Languages
8
An Overview of Lex
Lex
C compiler
a.out
Lex source
program
lex.yy.c
input
lex.yy.c
a.out
tokens
PLLab, NTHU,Cs2403 Programming
Languages
9
(optional)
(required)
Lex Source
• Lex source is separated into three sections by %
% delimiters
• The general format of Lex source is
• The absolute minimum Lex program is thus
{definitions}
%%
{transition rules}
%%
{user subroutines}
%%
PLLab, NTHU,Cs2403 Programming
Languages
10
Lex v.s. Yacc
• Lex
– Lex generates C code for a lexical analyzer, or
scanner
– Lex uses patterns that match strings in the input
and converts the strings to tokens
• Yacc
– Yacc generates C code for syntax analyzer, or
parser.
– Yacc uses grammar rules that allow it to analyze
tokens from Lex and create a syntax tree.
PLLab, NTHU,Cs2403 Programming
Languages
11
Lex with Yacc
Lex Yacc
yylex() yyparse()
Lex source
(Lexical Rules)
Yacc source
(Grammar Rules)
Input
Parsed
Input
lex.yy.c y.tab.c
return token
call
PLLab, NTHU,Cs2403 Programming
Languages
12
Regular Expressions
PLLab, NTHU,Cs2403 Programming
Languages
13
Lex Regular Expressions
(Extended Regular Expressions)
• A regular expression matches a set of strings
• Regular expression
– Operators
– Character classes
– Arbitrary character
– Optional expressions
– Alternation and grouping
– Context sensitivity
– Repetitions and definitions
PLLab, NTHU,Cs2403 Programming
Languages
14
Operators
“  [ ] ^ - ? . * + | ( ) $ / { } % < >
• If they are to be used as text characters, an
escape should be used
$ = “$”
 = “”
• Every character but blank, tab (t), newline (n)
and the list above is always a text character
PLLab, NTHU,Cs2403 Programming
Languages
15
Character Classes []
• [abc] matches a single character, which may
be a, b, or c
• Every operator meaning is ignored except  -
and ^
• e.g.
[ab] => a or b
[a-z] => a or b or c or … or z
[-+0-9] => all the digits and the two signs
[^a-zA-Z] => any character which is not a
letter
PLLab, NTHU,Cs2403 Programming
Languages
16
Arbitrary Character .
• To match almost character, the
operator character . is the class of all
characters except newline
• [40-176] matches all printable
characters in the ASCII character set,
from octal 40 (blank) to octal 176
(tilde~)
PLLab, NTHU,Cs2403 Programming
Languages
17
Optional & Repeated
Expressions
• a? => zero or one instance of a
• a* => zero or more instances of a
• a+ => one or more instances of a
• E.g.
ab?c => ac or abc
[a-z]+ => all strings of lower case letters
[a-zA-Z][a-zA-Z0-9]* => all
alphanumeric strings with a leading
alphabetic character
PLLab, NTHU,Cs2403 Programming
Languages
18
Precedence of Operators
• Level of precedence
– Kleene closure (*), ?, +
– concatenation
– alternation (|)
• All operators are left associative.
• Ex: a*b|cd* = ((a*)b)|(c(d*))
PLLab, NTHU,Cs2403 Programming
Languages
19
Pattern Matching Primitives
Metacharacter Matches
. any character except newline
n newline
* zero or more copies of the preceding expression
+ one or more copies of the preceding expression
? zero or one copy of the preceding expression
^ beginning of line / complement
$ end of line
a|b a or b
(ab)+ one or more copies of ab (grouping)
[ab] a or b
a{3} 3 instances of a
“a+b” literal “a+b” (C escapes still work)
PLLab, NTHU,Cs2403 Programming
Languages
20
Recall: Lex Source
• Lex source is a table of
– regular expressions and
– corresponding program fragments (actions)
…
%%
<regexp> <action>
<regexp> <action>
…
%%
%%
“=“ printf(“operator: ASSIGNMENT”);
a = b + c;
a operator: ASSIGNMENT b + c;
PLLab, NTHU,Cs2403 Programming
Languages
21
Transition Rules
• regexp <one or more blanks> action (C code);
• regexp <one or more blanks> { actions (C code) }
• A null statement ; will ignore the input (no
actions)
[ tn] ;
– Causes the three spacing characters to be ignored
a = b + c;
d = b * c;
↓ ↓
a=b+c;d=b*c;
PLLab, NTHU,Cs2403 Programming
Languages
22
Transition Rules (cont’d)
• Four special options for actions:
|, ECHO;, BEGIN, and REJECT;
• | indicates that the action for this rule is from
the action for the next rule
– [ tn] ;
– “ “ |
“t” |
“n” ;
• The unmatched token is using a default
action that ECHO from the input to the output
PLLab, NTHU,Cs2403 Programming
Languages
23
Transition Rules (cont’d)
• REJECT
– Go do the next alternative
…
%%
pink {npink++; REJECT;}
ink {nink++; REJECT;}
pin {npin++; REJECT;}
. |
n ;
%%
…
PLLab, NTHU,Cs2403 Programming
Languages
24
Lex Predefined Variables
• yytext -- a string containing the lexeme
• yyleng -- the length of the lexeme
• yyin -- the input stream pointer
– the default input of default main() is stdin
• yyout -- the output stream pointer
– the default output of default main() is stdout.
• cs20: %./a.out < inputfile > outfile
• E.g.
[a-z]+ printf(“%s”, yytext);
[a-z]+ ECHO;
[a-zA-Z]+ {words++; chars += yyleng;}
PLLab, NTHU,Cs2403 Programming
Languages
25
Lex Library Routines
• yylex()
– The default main() contains a call of yylex()
• yymore()
– return the next token
• yyless(n)
– retain the first n characters in yytext
• yywarp()
– is called whenever Lex reaches an end-of-file
– The default yywarp() always returns 1
PLLab, NTHU,Cs2403 Programming
Languages
26
Review of Lex Predefined
Variables
Name Function
char *yytext pointer to matched string
int yyleng length of matched string
FILE *yyin input stream pointer
FILE *yyout output stream pointer
int yylex(void) call to invoke lexer, returns token
char* yymore(void) return the next token
int yyless(int n) retain the first n characters in yytext
int yywrap(void) wrapup, return 1 if done, 0 if not done
ECHO write matched string
REJECT go to the next alternative rule
INITAL initial start condition
BEGIN condition switch start condition
PLLab, NTHU,Cs2403 Programming
Languages
27
User Subroutines Section
• You can use your Lex routines in the same
ways you use routines in other programming
languages.
%{
void foo();
%}
letter [a-zA-Z]
%%
{letter}+ foo();
%%
…
void foo() {
…
}
PLLab, NTHU,Cs2403 Programming
Languages
28
User Subroutines Section
(cont’d)
• The section where main() is placed
%{
int counter = 0;
%}
letter [a-zA-Z]
%%
{letter}+ {printf(“a wordn”); counter++;}
%%
main() {
yylex();
printf(“There are total %d wordsn”, counter);
}
PLLab, NTHU,Cs2403 Programming
Languages
29
Usage
• To run Lex on a source file, type
lex scanner.l
• It produces a file named lex.yy.c which is
a C program for the lexical analyzer.
• To compile lex.yy.c, type
cc lex.yy.c –ll
• To run the lexical analyzer program, type
./a.out < inputfile
PLLab, NTHU,Cs2403 Programming
Languages
30
Versions of Lex
• AT&T -- lex
http://www.combo.org/lex_yacc_page/lex.html
• GNU -- flex
http://www.gnu.org/manual/flex-2.5.4/flex.html
• a Win32 version of flex :
http://www.monmouth.com/~wstreett/lex-yacc/lex-yacc.html
or Cygwin :
http://sources.redhat.com/cygwin/
• Lex on different machines is not created equal.
PLLab, NTHU,Cs2403 Programming
Languages
31
Yacc - Yet Another Compiler-
Compiler
PLLab, NTHU,Cs2403 Programming
Languages
32
Introduction
• What is YACC ?
– Tool which will produce a parser for a
given grammar.
– YACC (Yet Another Compiler Compiler)
is a program designed to compile a
LALR(1) grammar and to produce the
source code of the syntactic analyzer of
the language produced by this grammar.
PLLab, NTHU,Cs2403 Programming
Languages
33
How YACC Works
a.out
File containing desired
grammar in yacc format
yacc program
yacc program
C source program created by yacc
C compiler
C compiler
Executable program that will parse
grammar given in gram.y
gram.y
yacc
y.tab.c
cc
or gcc
PLLab, NTHU,Cs2403 Programming
Languages
34
yacc
How YACC Works
(1) Parser generation time
YACC source (*.y)
y.tab.h
y.tab.c
C compiler/linker
(2) Compile time
y.tab.c a.out
a.out
(3) Run time
Token stream
Abstract
Syntax
Tree
y.output
PLLab, NTHU,Cs2403 Programming
Languages
35
An YACC File Example
%{
#include <stdio.h>
%}
%token NAME NUMBER
%%
statement: NAME '=' expression
| expression { printf("= %dn", $1); }
;
expression: expression '+' NUMBER { $$ = $1 + $3; }
| expression '-' NUMBER { $$ = $1 - $3; }
| NUMBER { $$ = $1; }
;
%%
int yyerror(char *s)
{
fprintf(stderr, "%sn", s);
return 0;
}
int main(void)
{
yyparse();
return 0;
}
PLLab, NTHU,Cs2403 Programming
Languages
36
Works with Lex
How to
work ?
PLLab, NTHU,Cs2403 Programming
Languages
37
Works with Lex
call yylex()
[0-9]+
next token is NUM
NUM ‘+’ NUM
PLLab, NTHU,Cs2403 Programming
Languages
38
YACC File Format
%{
C declarations
%}
yacc declarations
%%
Grammar rules
%%
Additional C code
– Comments enclosed in /* ... */ may appear in
any of the sections.
PLLab, NTHU,Cs2403 Programming
Languages
39
Definitions Section
%{
#include <stdio.h>
#include <stdlib.h>
%}
%token ID NUM
%start expr
It is a terminal
由 expr 開始
parse
PLLab, NTHU,Cs2403 Programming
Languages
40
Start Symbol
• The first non-terminal specified in the
grammar specification section.
• To overwrite it with %start declaraction.
%start non-terminal
PLLab, NTHU,Cs2403 Programming
Languages
41
Rules Section
• This section defines grammar
• Example
expr : expr '+' term | term;
term : term '*' factor | factor;
factor : '(' expr ')' | ID | NUM;
PLLab, NTHU,Cs2403 Programming
Languages
42
Rules Section
• Normally written like this
• Example:
expr : expr '+' term
| term
;
term : term '*' factor
| factor
;
factor : '(' expr ')'
| ID
| NUM
;
PLLab, NTHU,Cs2403 Programming
Languages
43
The Position of Rules
expr : expr '+' term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr ')' { $$ = $2; }
| ID
| NUM
;
PLLab, NTHU,Cs2403 Programming
Languages
44
The Position of Rules
expr : expr
expr '+' term { $$ = $1 + $3; }
| term
term { $$ = $1; }
;
term : term
term '*' factor { $$ = $1 * $3; }
| factor
factor { $$ = $1; }
;
factor : '(
(' expr ')' { $$ = $2; }
| ID
| NUM
;
$1
$1
PLLab, NTHU,Cs2403 Programming
Languages
45
The Position of Rules
expr : expr '+
+' term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '*
*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr
expr ')' { $$ = $2; }
| ID
| NUM
; $2
$2
PLLab, NTHU,Cs2403 Programming
Languages
46
The Position of Rules
expr : expr '+' term
term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '*' factor
factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr ')
)' { $$ = $2; }
| ID
| NUM
; $3
$3 Default: $$ = $1;
Default: $$ = $1;
PLLab, NTHU,Cs2403 Programming
Languages
47
Communication between LEX and
YACC
call yylex()
[0-9]+
next token is NUM
NUM ‘+’ NUM
LEX and YACC 需要一套方法確認 token 的身份
PLLab, NTHU,Cs2403 Programming
Languages
48
Communication between LEX
and YACC
yacc -d gram.y
Will produce:
y.tab.h
• Use enumeration ( 列舉 ) /
define
• 由一方產生,另一方
include
• YACC 產生 y.tab.h
• LEX include y.tab.h
PLLab, NTHU,Cs2403 Programming
Languages
49
Communication between LEX
and YACC
%{
#include <stdio.h>
#include "y.tab.h"
%}
id [_a-zA-Z][_a-zA-Z0-9]*
%%
int { return INT; }
char { return CHAR; }
float { return FLOAT; }
{id} { return ID;}
%{
#include <stdio.h>
#include <stdlib.h>
%}
%token CHAR, FLOAT, ID, INT
%%
yacc -d xxx.y
Produced
y.tab.h:
# define CHAR 258
# define FLOAT 259
# define ID 260
# define INT 261
parser.y
scanner.l
PLLab, NTHU,Cs2403 Programming
Languages
50
YACC
• Rules may be recursive
• Rules may be ambiguous*
• Uses bottom up Shift/Reduce parsing
– Get a token
– Push onto stack
– Can it reduced (How do we know?)
• If yes: Reduce using a rule
• If no: Get another token
• Yacc cannot look ahead more than one token
Phrase -> cart_animal AND CART
| work_animal AND PLOW
…
PLLab, NTHU,Cs2403 Programming
Languages
51
Yacc Example
• Taken from Lex & Yacc
• Simple calculator
a = 4 + 6
a
a=10
b = 7
c = a + b
c
c = 17
$
PLLab, NTHU,Cs2403 Programming
Languages
52
Grammar
expression ::= expression '+' term |
expression '-' term |
term
term ::= term '*' factor |
term '/' factor |
factor
factor ::= '(' expression ')' |
'-' factor |
NUMBER |
NAME
PLLab, NTHU,Cs2403 Programming
Languages
53
statement_list: statement 'n'
| statement_list statement 'n'
;
statement: NAME '=' expression { $1->value = $3; }
| expression { printf("= %gn", $1); }
;
expression: expression '+' term { $$ = $1 + $3; }
| expression '-' term { $$ = $1 - $3; }
| term
;
parser.y
Parser (cont’d)
PLLab, NTHU,Cs2403 Programming
Languages
54
term: term '*' factor { $$ = $1 * $3; }
| term '/' factor { if ($3 == 0.0)
yyerror("divide by zero");
else
$$ = $1 / $3;
}
| factor
;
factor: '(' expression ')' { $$ = $2; }
| '-' factor { $$ = -$2; }
| NUMBER { $$ = $1; }
| NAME { $$ = $1->value; }
;
%%
parser.y
Parser (cont’d)
PLLab, NTHU,Cs2403 Programming
Languages
55
%{
#include "y.tab.h"
#include "parser.h"
#include <math.h>
%}
%%
([0-9]+|([0-9]*.[0-9]+)([eE][-+]?[0-9]+)?) {
yylval.dval = atof(yytext);
return NUMBER;
}
[ t] ; /* ignore white space */
Scanner
scanner.l
PLLab, NTHU,Cs2403 Programming
Languages
56
[A-Za-z][A-Za-z0-9]* { /* return symbol pointer */
yylval.symp = symlook(yytext);
return NAME;
}
"$" { return 0; /* end of input */ }
n|”=“|”+”|”-”|”*”|”/” return yytext[0];
%%
Scanner (cont’d)
scanner.l
PLLab, NTHU,Cs2403 Programming
Languages
57
YACC Command
• Yacc (AT&T)
– yacc –d xxx.y
• Bison (GNU)
– bison –d –y xxx.y
產生 y.tab.c, 與 yacc 相同
不然會產生 xxx.tab.c
PLLab, NTHU,Cs2403 Programming
Languages
58
Precedence /
Association
1. 1-2-3 = (1-2)-3? or 1-(2-3)?
Define ‘-’ operator is left-association.
2. 1-2*3 = 1-(2*3)
Define “*” operator is precedent to “-”
operator
(1) 1 – 2 - 3
(2) 1 – 2 * 3
PLLab, NTHU,Cs2403 Programming
Languages
59
Precedence /
Association
%right ‘=‘
%left '<' '>' NE LE GE
%left '+' '-‘
%left '*' '/'
highest precedence
PLLab, NTHU,Cs2403 Programming
Languages
60
Precedence /
Association
expr : expr ‘+’ expr { $$ = $1 + $3; }
| expr ‘-’ expr { $$ = $1 - $3; }
| expr ‘*’ expr { $$ = $1 * $3; }
| expr ‘/’ expr
{
if($3==0)
yyerror(“divide 0”);
else
$$ = $1 / $3;
}
| ‘-’ expr %prec UMINUS {$$ = -$2; }
%left '+' '-'
%left '*' '/'
%noassoc UMINUS
PLLab, NTHU,Cs2403 Programming
Languages
61
Shift/Reduce Conflicts
• shift/reduce conflict
– occurs when a grammar is written in
such a way that a decision between
shifting and reducing can not be made.
– ex: IF-ELSE ambigious.
• To resolve this conflict, yacc will
choose to shift.
PLLab, NTHU,Cs2403 Programming
Languages
62
YACC Declaration
Summary
`%start'
Specify the grammar's start symbol
`%union'
Declare the collection of data types that semantic values may
have
`%token'
Declare a terminal symbol (token type name) with no
precedence or
associativity specified
`%type'
Declare the type of semantic values for a nonterminal symbol
PLLab, NTHU,Cs2403 Programming
Languages
63
YACC Declaration
Summary
`%right'
Declare a terminal symbol (token type name) that is
right-associative
`%left'
Declare a terminal symbol (token type name) that is left-
associative
`%nonassoc'
Declare a terminal symbol (token type name) that is
nonassociative
(using it in a way that would be associative is a syntax error,
ex: x op. y op. z is syntax error)
PLLab, NTHU,Cs2403 Programming
Languages
64
Reference Books
• lex & yacc, 2nd Edition
– by John R.Levine, Tony Mason & Doug
Brown
– O’Reilly
– ISBN: 1-56592-000-7
• Mastering Regular Expressions
– by Jeffrey E.F. Friedl
– O’Reilly
– ISBN: 1-56592-257-3

More Related Content

Similar to lex and yacc.pdf

Compiler Design Tutorial
Compiler Design Tutorial Compiler Design Tutorial
Compiler Design Tutorial
Sarit Chakraborty
 
Lex & yacc
Lex & yaccLex & yacc
Lex & yacc
Taha Malampatti
 
Module4 lex and yacc.ppt
Module4 lex and yacc.pptModule4 lex and yacc.ppt
Module4 lex and yacc.ppt
ProddaturNagaVenkata
 
Compiler Engineering Lab#5 : Symbol Table, Flex Tool
Compiler Engineering Lab#5 : Symbol Table, Flex ToolCompiler Engineering Lab#5 : Symbol Table, Flex Tool
Compiler Engineering Lab#5 : Symbol Table, Flex Tool
MashaelQ
 
lecture_lex.pdf
lecture_lex.pdflecture_lex.pdf
lecture_lex.pdf
DrNilotpalChakrabort
 
LVEE 2014: Text parsing with Python and PLY
LVEE 2014: Text parsing with Python and PLYLVEE 2014: Text parsing with Python and PLY
LVEE 2014: Text parsing with Python and PLY
dmbaturin
 
Lisp and scheme i
Lisp and scheme iLisp and scheme i
Lisp and scheme i
Luis Goldster
 
Saumya Debray The University of Arizona Tucson
Saumya Debray The University of Arizona TucsonSaumya Debray The University of Arizona Tucson
Saumya Debray The University of Arizona Tucson
jeronimored
 
Compiler Construction | Lecture 10 | Data-Flow Analysis
Compiler Construction | Lecture 10 | Data-Flow AnalysisCompiler Construction | Lecture 10 | Data-Flow Analysis
Compiler Construction | Lecture 10 | Data-Flow Analysis
Eelco Visser
 
Ch3.ppt
Ch3.pptCh3.ppt
Ch3.ppt
ProvatMajhi
 
Ch3.ppt
Ch3.pptCh3.ppt
Regular expressions h1
Regular expressions h1Regular expressions h1
Regular expressions h1
Rajendran
 
Lexical
LexicalLexical
Lex tool manual
Lex tool manualLex tool manual
Lex tool manualSami Said
 
Tools for reading papers
Tools for reading papersTools for reading papers
Tools for reading papers
Jack Fox
 
Theory of computing
Theory of computingTheory of computing
Theory of computingRanjan Kumar
 
Lex and Yacc Tool M1.ppt
Lex and Yacc Tool M1.pptLex and Yacc Tool M1.ppt
Lex and Yacc Tool M1.ppt
MohitJain296729
 

Similar to lex and yacc.pdf (20)

Compiler Design Tutorial
Compiler Design Tutorial Compiler Design Tutorial
Compiler Design Tutorial
 
Lex & yacc
Lex & yaccLex & yacc
Lex & yacc
 
Module4 lex and yacc.ppt
Module4 lex and yacc.pptModule4 lex and yacc.ppt
Module4 lex and yacc.ppt
 
Compiler Engineering Lab#5 : Symbol Table, Flex Tool
Compiler Engineering Lab#5 : Symbol Table, Flex ToolCompiler Engineering Lab#5 : Symbol Table, Flex Tool
Compiler Engineering Lab#5 : Symbol Table, Flex Tool
 
lecture_lex.pdf
lecture_lex.pdflecture_lex.pdf
lecture_lex.pdf
 
Lexicalanalyzer
LexicalanalyzerLexicalanalyzer
Lexicalanalyzer
 
Lexicalanalyzer
LexicalanalyzerLexicalanalyzer
Lexicalanalyzer
 
LVEE 2014: Text parsing with Python and PLY
LVEE 2014: Text parsing with Python and PLYLVEE 2014: Text parsing with Python and PLY
LVEE 2014: Text parsing with Python and PLY
 
Lisp and scheme i
Lisp and scheme iLisp and scheme i
Lisp and scheme i
 
Lexyacc
LexyaccLexyacc
Lexyacc
 
Saumya Debray The University of Arizona Tucson
Saumya Debray The University of Arizona TucsonSaumya Debray The University of Arizona Tucson
Saumya Debray The University of Arizona Tucson
 
Compiler Construction | Lecture 10 | Data-Flow Analysis
Compiler Construction | Lecture 10 | Data-Flow AnalysisCompiler Construction | Lecture 10 | Data-Flow Analysis
Compiler Construction | Lecture 10 | Data-Flow Analysis
 
Ch3.ppt
Ch3.pptCh3.ppt
Ch3.ppt
 
Ch3.ppt
Ch3.pptCh3.ppt
Ch3.ppt
 
Regular expressions h1
Regular expressions h1Regular expressions h1
Regular expressions h1
 
Lexical
LexicalLexical
Lexical
 
Lex tool manual
Lex tool manualLex tool manual
Lex tool manual
 
Tools for reading papers
Tools for reading papersTools for reading papers
Tools for reading papers
 
Theory of computing
Theory of computingTheory of computing
Theory of computing
 
Lex and Yacc Tool M1.ppt
Lex and Yacc Tool M1.pptLex and Yacc Tool M1.ppt
Lex and Yacc Tool M1.ppt
 

Recently uploaded

Basic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparelBasic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparel
top1002
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Soumen Santra
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
ssuser7dcef0
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
Vijay Dialani, PhD
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 

Recently uploaded (20)

Basic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparelBasic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparel
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 

lex and yacc.pdf

  • 1. PLLab, NTHU,Cs2403 Programming Languages 1 Lex Yacc tutorial Kun-Yuan Hsieh kyshieh@pllab.cs.nthu.edu.tw Programming Language Lab., NTHU
  • 4. PLLab, NTHU,Cs2403 Programming Languages 4 What is Lex? • The main job of a lexical analyzer (scanner) is to break up an input stream into more usable elements (tokens) a = b + c * d; ID ASSIGN ID PLUS ID MULT ID SEMI • Lex is an utility to help you rapidly generate your scanners
  • 5. PLLab, NTHU,Cs2403 Programming Languages 5 Lex – Lexical Analyzer • Lexical analyzers tokenize input streams • Tokens are the terminals of a language – English • words, punctuation marks, … – Programming language • Identifiers, operators, keywords, … • Regular expressions define terminals/tokens
  • 6. PLLab, NTHU,Cs2403 Programming Languages 6 Lex Source Program • Lex source is a table of – regular expressions and – corresponding program fragments digit [0-9] letter [a-zA-Z] %% {letter}({letter}|{digit})* printf(“id: %sn”, yytext); n printf(“new linen”); %% main() { yylex(); }
  • 7. PLLab, NTHU,Cs2403 Programming Languages 7 Lex Source to C Program • The table is translated to a C program (lex.yy.c) which – reads an input stream – partitioning the input into strings which match the given expressions and – copying it to an output stream if necessary
  • 8. PLLab, NTHU,Cs2403 Programming Languages 8 An Overview of Lex Lex C compiler a.out Lex source program lex.yy.c input lex.yy.c a.out tokens
  • 9. PLLab, NTHU,Cs2403 Programming Languages 9 (optional) (required) Lex Source • Lex source is separated into three sections by % % delimiters • The general format of Lex source is • The absolute minimum Lex program is thus {definitions} %% {transition rules} %% {user subroutines} %%
  • 10. PLLab, NTHU,Cs2403 Programming Languages 10 Lex v.s. Yacc • Lex – Lex generates C code for a lexical analyzer, or scanner – Lex uses patterns that match strings in the input and converts the strings to tokens • Yacc – Yacc generates C code for syntax analyzer, or parser. – Yacc uses grammar rules that allow it to analyze tokens from Lex and create a syntax tree.
  • 11. PLLab, NTHU,Cs2403 Programming Languages 11 Lex with Yacc Lex Yacc yylex() yyparse() Lex source (Lexical Rules) Yacc source (Grammar Rules) Input Parsed Input lex.yy.c y.tab.c return token call
  • 13. PLLab, NTHU,Cs2403 Programming Languages 13 Lex Regular Expressions (Extended Regular Expressions) • A regular expression matches a set of strings • Regular expression – Operators – Character classes – Arbitrary character – Optional expressions – Alternation and grouping – Context sensitivity – Repetitions and definitions
  • 14. PLLab, NTHU,Cs2403 Programming Languages 14 Operators “ [ ] ^ - ? . * + | ( ) $ / { } % < > • If they are to be used as text characters, an escape should be used $ = “$” = “” • Every character but blank, tab (t), newline (n) and the list above is always a text character
  • 15. PLLab, NTHU,Cs2403 Programming Languages 15 Character Classes [] • [abc] matches a single character, which may be a, b, or c • Every operator meaning is ignored except - and ^ • e.g. [ab] => a or b [a-z] => a or b or c or … or z [-+0-9] => all the digits and the two signs [^a-zA-Z] => any character which is not a letter
  • 16. PLLab, NTHU,Cs2403 Programming Languages 16 Arbitrary Character . • To match almost character, the operator character . is the class of all characters except newline • [40-176] matches all printable characters in the ASCII character set, from octal 40 (blank) to octal 176 (tilde~)
  • 17. PLLab, NTHU,Cs2403 Programming Languages 17 Optional & Repeated Expressions • a? => zero or one instance of a • a* => zero or more instances of a • a+ => one or more instances of a • E.g. ab?c => ac or abc [a-z]+ => all strings of lower case letters [a-zA-Z][a-zA-Z0-9]* => all alphanumeric strings with a leading alphabetic character
  • 18. PLLab, NTHU,Cs2403 Programming Languages 18 Precedence of Operators • Level of precedence – Kleene closure (*), ?, + – concatenation – alternation (|) • All operators are left associative. • Ex: a*b|cd* = ((a*)b)|(c(d*))
  • 19. PLLab, NTHU,Cs2403 Programming Languages 19 Pattern Matching Primitives Metacharacter Matches . any character except newline n newline * zero or more copies of the preceding expression + one or more copies of the preceding expression ? zero or one copy of the preceding expression ^ beginning of line / complement $ end of line a|b a or b (ab)+ one or more copies of ab (grouping) [ab] a or b a{3} 3 instances of a “a+b” literal “a+b” (C escapes still work)
  • 20. PLLab, NTHU,Cs2403 Programming Languages 20 Recall: Lex Source • Lex source is a table of – regular expressions and – corresponding program fragments (actions) … %% <regexp> <action> <regexp> <action> … %% %% “=“ printf(“operator: ASSIGNMENT”); a = b + c; a operator: ASSIGNMENT b + c;
  • 21. PLLab, NTHU,Cs2403 Programming Languages 21 Transition Rules • regexp <one or more blanks> action (C code); • regexp <one or more blanks> { actions (C code) } • A null statement ; will ignore the input (no actions) [ tn] ; – Causes the three spacing characters to be ignored a = b + c; d = b * c; ↓ ↓ a=b+c;d=b*c;
  • 22. PLLab, NTHU,Cs2403 Programming Languages 22 Transition Rules (cont’d) • Four special options for actions: |, ECHO;, BEGIN, and REJECT; • | indicates that the action for this rule is from the action for the next rule – [ tn] ; – “ “ | “t” | “n” ; • The unmatched token is using a default action that ECHO from the input to the output
  • 23. PLLab, NTHU,Cs2403 Programming Languages 23 Transition Rules (cont’d) • REJECT – Go do the next alternative … %% pink {npink++; REJECT;} ink {nink++; REJECT;} pin {npin++; REJECT;} . | n ; %% …
  • 24. PLLab, NTHU,Cs2403 Programming Languages 24 Lex Predefined Variables • yytext -- a string containing the lexeme • yyleng -- the length of the lexeme • yyin -- the input stream pointer – the default input of default main() is stdin • yyout -- the output stream pointer – the default output of default main() is stdout. • cs20: %./a.out < inputfile > outfile • E.g. [a-z]+ printf(“%s”, yytext); [a-z]+ ECHO; [a-zA-Z]+ {words++; chars += yyleng;}
  • 25. PLLab, NTHU,Cs2403 Programming Languages 25 Lex Library Routines • yylex() – The default main() contains a call of yylex() • yymore() – return the next token • yyless(n) – retain the first n characters in yytext • yywarp() – is called whenever Lex reaches an end-of-file – The default yywarp() always returns 1
  • 26. PLLab, NTHU,Cs2403 Programming Languages 26 Review of Lex Predefined Variables Name Function char *yytext pointer to matched string int yyleng length of matched string FILE *yyin input stream pointer FILE *yyout output stream pointer int yylex(void) call to invoke lexer, returns token char* yymore(void) return the next token int yyless(int n) retain the first n characters in yytext int yywrap(void) wrapup, return 1 if done, 0 if not done ECHO write matched string REJECT go to the next alternative rule INITAL initial start condition BEGIN condition switch start condition
  • 27. PLLab, NTHU,Cs2403 Programming Languages 27 User Subroutines Section • You can use your Lex routines in the same ways you use routines in other programming languages. %{ void foo(); %} letter [a-zA-Z] %% {letter}+ foo(); %% … void foo() { … }
  • 28. PLLab, NTHU,Cs2403 Programming Languages 28 User Subroutines Section (cont’d) • The section where main() is placed %{ int counter = 0; %} letter [a-zA-Z] %% {letter}+ {printf(“a wordn”); counter++;} %% main() { yylex(); printf(“There are total %d wordsn”, counter); }
  • 29. PLLab, NTHU,Cs2403 Programming Languages 29 Usage • To run Lex on a source file, type lex scanner.l • It produces a file named lex.yy.c which is a C program for the lexical analyzer. • To compile lex.yy.c, type cc lex.yy.c –ll • To run the lexical analyzer program, type ./a.out < inputfile
  • 30. PLLab, NTHU,Cs2403 Programming Languages 30 Versions of Lex • AT&T -- lex http://www.combo.org/lex_yacc_page/lex.html • GNU -- flex http://www.gnu.org/manual/flex-2.5.4/flex.html • a Win32 version of flex : http://www.monmouth.com/~wstreett/lex-yacc/lex-yacc.html or Cygwin : http://sources.redhat.com/cygwin/ • Lex on different machines is not created equal.
  • 31. PLLab, NTHU,Cs2403 Programming Languages 31 Yacc - Yet Another Compiler- Compiler
  • 32. PLLab, NTHU,Cs2403 Programming Languages 32 Introduction • What is YACC ? – Tool which will produce a parser for a given grammar. – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar and to produce the source code of the syntactic analyzer of the language produced by this grammar.
  • 33. PLLab, NTHU,Cs2403 Programming Languages 33 How YACC Works a.out File containing desired grammar in yacc format yacc program yacc program C source program created by yacc C compiler C compiler Executable program that will parse grammar given in gram.y gram.y yacc y.tab.c cc or gcc
  • 34. PLLab, NTHU,Cs2403 Programming Languages 34 yacc How YACC Works (1) Parser generation time YACC source (*.y) y.tab.h y.tab.c C compiler/linker (2) Compile time y.tab.c a.out a.out (3) Run time Token stream Abstract Syntax Tree y.output
  • 35. PLLab, NTHU,Cs2403 Programming Languages 35 An YACC File Example %{ #include <stdio.h> %} %token NAME NUMBER %% statement: NAME '=' expression | expression { printf("= %dn", $1); } ; expression: expression '+' NUMBER { $$ = $1 + $3; } | expression '-' NUMBER { $$ = $1 - $3; } | NUMBER { $$ = $1; } ; %% int yyerror(char *s) { fprintf(stderr, "%sn", s); return 0; } int main(void) { yyparse(); return 0; }
  • 37. PLLab, NTHU,Cs2403 Programming Languages 37 Works with Lex call yylex() [0-9]+ next token is NUM NUM ‘+’ NUM
  • 38. PLLab, NTHU,Cs2403 Programming Languages 38 YACC File Format %{ C declarations %} yacc declarations %% Grammar rules %% Additional C code – Comments enclosed in /* ... */ may appear in any of the sections.
  • 39. PLLab, NTHU,Cs2403 Programming Languages 39 Definitions Section %{ #include <stdio.h> #include <stdlib.h> %} %token ID NUM %start expr It is a terminal 由 expr 開始 parse
  • 40. PLLab, NTHU,Cs2403 Programming Languages 40 Start Symbol • The first non-terminal specified in the grammar specification section. • To overwrite it with %start declaraction. %start non-terminal
  • 41. PLLab, NTHU,Cs2403 Programming Languages 41 Rules Section • This section defines grammar • Example expr : expr '+' term | term; term : term '*' factor | factor; factor : '(' expr ')' | ID | NUM;
  • 42. PLLab, NTHU,Cs2403 Programming Languages 42 Rules Section • Normally written like this • Example: expr : expr '+' term | term ; term : term '*' factor | factor ; factor : '(' expr ')' | ID | NUM ;
  • 43. PLLab, NTHU,Cs2403 Programming Languages 43 The Position of Rules expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ; term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } ; factor : '(' expr ')' { $$ = $2; } | ID | NUM ;
  • 44. PLLab, NTHU,Cs2403 Programming Languages 44 The Position of Rules expr : expr expr '+' term { $$ = $1 + $3; } | term term { $$ = $1; } ; term : term term '*' factor { $$ = $1 * $3; } | factor factor { $$ = $1; } ; factor : '( (' expr ')' { $$ = $2; } | ID | NUM ; $1 $1
  • 45. PLLab, NTHU,Cs2403 Programming Languages 45 The Position of Rules expr : expr '+ +' term { $$ = $1 + $3; } | term { $$ = $1; } ; term : term '* *' factor { $$ = $1 * $3; } | factor { $$ = $1; } ; factor : '(' expr expr ')' { $$ = $2; } | ID | NUM ; $2 $2
  • 46. PLLab, NTHU,Cs2403 Programming Languages 46 The Position of Rules expr : expr '+' term term { $$ = $1 + $3; } | term { $$ = $1; } ; term : term '*' factor factor { $$ = $1 * $3; } | factor { $$ = $1; } ; factor : '(' expr ') )' { $$ = $2; } | ID | NUM ; $3 $3 Default: $$ = $1; Default: $$ = $1;
  • 47. PLLab, NTHU,Cs2403 Programming Languages 47 Communication between LEX and YACC call yylex() [0-9]+ next token is NUM NUM ‘+’ NUM LEX and YACC 需要一套方法確認 token 的身份
  • 48. PLLab, NTHU,Cs2403 Programming Languages 48 Communication between LEX and YACC yacc -d gram.y Will produce: y.tab.h • Use enumeration ( 列舉 ) / define • 由一方產生,另一方 include • YACC 產生 y.tab.h • LEX include y.tab.h
  • 49. PLLab, NTHU,Cs2403 Programming Languages 49 Communication between LEX and YACC %{ #include <stdio.h> #include "y.tab.h" %} id [_a-zA-Z][_a-zA-Z0-9]* %% int { return INT; } char { return CHAR; } float { return FLOAT; } {id} { return ID;} %{ #include <stdio.h> #include <stdlib.h> %} %token CHAR, FLOAT, ID, INT %% yacc -d xxx.y Produced y.tab.h: # define CHAR 258 # define FLOAT 259 # define ID 260 # define INT 261 parser.y scanner.l
  • 50. PLLab, NTHU,Cs2403 Programming Languages 50 YACC • Rules may be recursive • Rules may be ambiguous* • Uses bottom up Shift/Reduce parsing – Get a token – Push onto stack – Can it reduced (How do we know?) • If yes: Reduce using a rule • If no: Get another token • Yacc cannot look ahead more than one token Phrase -> cart_animal AND CART | work_animal AND PLOW …
  • 51. PLLab, NTHU,Cs2403 Programming Languages 51 Yacc Example • Taken from Lex & Yacc • Simple calculator a = 4 + 6 a a=10 b = 7 c = a + b c c = 17 $
  • 52. PLLab, NTHU,Cs2403 Programming Languages 52 Grammar expression ::= expression '+' term | expression '-' term | term term ::= term '*' factor | term '/' factor | factor factor ::= '(' expression ')' | '-' factor | NUMBER | NAME
  • 53. PLLab, NTHU,Cs2403 Programming Languages 53 statement_list: statement 'n' | statement_list statement 'n' ; statement: NAME '=' expression { $1->value = $3; } | expression { printf("= %gn", $1); } ; expression: expression '+' term { $$ = $1 + $3; } | expression '-' term { $$ = $1 - $3; } | term ; parser.y Parser (cont’d)
  • 54. PLLab, NTHU,Cs2403 Programming Languages 54 term: term '*' factor { $$ = $1 * $3; } | term '/' factor { if ($3 == 0.0) yyerror("divide by zero"); else $$ = $1 / $3; } | factor ; factor: '(' expression ')' { $$ = $2; } | '-' factor { $$ = -$2; } | NUMBER { $$ = $1; } | NAME { $$ = $1->value; } ; %% parser.y Parser (cont’d)
  • 55. PLLab, NTHU,Cs2403 Programming Languages 55 %{ #include "y.tab.h" #include "parser.h" #include <math.h> %} %% ([0-9]+|([0-9]*.[0-9]+)([eE][-+]?[0-9]+)?) { yylval.dval = atof(yytext); return NUMBER; } [ t] ; /* ignore white space */ Scanner scanner.l
  • 56. PLLab, NTHU,Cs2403 Programming Languages 56 [A-Za-z][A-Za-z0-9]* { /* return symbol pointer */ yylval.symp = symlook(yytext); return NAME; } "$" { return 0; /* end of input */ } n|”=“|”+”|”-”|”*”|”/” return yytext[0]; %% Scanner (cont’d) scanner.l
  • 57. PLLab, NTHU,Cs2403 Programming Languages 57 YACC Command • Yacc (AT&T) – yacc –d xxx.y • Bison (GNU) – bison –d –y xxx.y 產生 y.tab.c, 與 yacc 相同 不然會產生 xxx.tab.c
  • 58. PLLab, NTHU,Cs2403 Programming Languages 58 Precedence / Association 1. 1-2-3 = (1-2)-3? or 1-(2-3)? Define ‘-’ operator is left-association. 2. 1-2*3 = 1-(2*3) Define “*” operator is precedent to “-” operator (1) 1 – 2 - 3 (2) 1 – 2 * 3
  • 59. PLLab, NTHU,Cs2403 Programming Languages 59 Precedence / Association %right ‘=‘ %left '<' '>' NE LE GE %left '+' '-‘ %left '*' '/' highest precedence
  • 60. PLLab, NTHU,Cs2403 Programming Languages 60 Precedence / Association expr : expr ‘+’ expr { $$ = $1 + $3; } | expr ‘-’ expr { $$ = $1 - $3; } | expr ‘*’ expr { $$ = $1 * $3; } | expr ‘/’ expr { if($3==0) yyerror(“divide 0”); else $$ = $1 / $3; } | ‘-’ expr %prec UMINUS {$$ = -$2; } %left '+' '-' %left '*' '/' %noassoc UMINUS
  • 61. PLLab, NTHU,Cs2403 Programming Languages 61 Shift/Reduce Conflicts • shift/reduce conflict – occurs when a grammar is written in such a way that a decision between shifting and reducing can not be made. – ex: IF-ELSE ambigious. • To resolve this conflict, yacc will choose to shift.
  • 62. PLLab, NTHU,Cs2403 Programming Languages 62 YACC Declaration Summary `%start' Specify the grammar's start symbol `%union' Declare the collection of data types that semantic values may have `%token' Declare a terminal symbol (token type name) with no precedence or associativity specified `%type' Declare the type of semantic values for a nonterminal symbol
  • 63. PLLab, NTHU,Cs2403 Programming Languages 63 YACC Declaration Summary `%right' Declare a terminal symbol (token type name) that is right-associative `%left' Declare a terminal symbol (token type name) that is left- associative `%nonassoc' Declare a terminal symbol (token type name) that is nonassociative (using it in a way that would be associative is a syntax error, ex: x op. y op. z is syntax error)
  • 64. PLLab, NTHU,Cs2403 Programming Languages 64 Reference Books • lex & yacc, 2nd Edition – by John R.Levine, Tony Mason & Doug Brown – O’Reilly – ISBN: 1-56592-000-7 • Mastering Regular Expressions – by Jeffrey E.F. Friedl – O’Reilly – ISBN: 1-56592-257-3