Lex & Yacc
Dr.R. Suganya
Lex
 lex is a program (generator) that generates lexical
analyzers, (widely used on Unix).
 It is mostly used with Yacc parser generator.
 Written by Eric Schmidt and Mike Lesk.
 It reads the input stream (specifying the lexical analyzer )
and outputs source code implementing the lexical analyzer
in the C programming language.
 Lex will read patterns (regular expressions); then produces
C code for a lexical analyzer that scans for identifiers.
2
Hathal & Ahmad
Lex
◦ A simple pattern: letter(letter|digit)*
 Regular expressions are translated by lex to a computer program that mimics an
FSA.
 This pattern matches a string of characters that begins with a single letter
followed by zero or more letters or digits.
3
Hathal & Ahmad
Lex
 Some limitations, Lex cannot be used to recognize nested structures such as
parentheses, since it only has states and transitions between states.
 So, Lex is good at pattern matching, while Yacc is for more challenging
tasks.
4
Hathal & Ahmad
Lex
Pattern Matching Primitives
5
Hathal & Ahmad
Lex
• Pattern Matching examples.
6
Hathal & Ahmad
Lex
……..Definitions section……
%%
……Rules section……..
%%
……….C code section (subroutines)……..
 The input structure to Lex.
•Echo is an action and predefined macro in
lex that writes code matched by the
pattern.
7
Hathal & Ahmad
Lex
Lex predefined variables.
8
Hathal & Ahmad
Lex
 Whitespace must separate the defining term and the associated expression.
 Code in the definitions section is simply copied as-is to the top of the generated C file and must
be bracketed with “%{“ and “%}” markers.
 substitutions in the rules section are surrounded by braces ({letter}) to distinguish them from
literals.
9
Hathal & Ahmad
Yacc
 Theory:
◦ Yacc reads the grammar and generate C code for a parser .
◦ Grammars written in Backus Naur Form (BNF) .
◦ BNF grammar used to express context-free languages .
◦ e.g. to parse an expression , do reverse operation( reducing the
expression)
◦ This known as bottom-up or shift-reduce parsing .
◦ Using stack for storing (LIFO).
10
Hathal & Ahmad
Yacc
 Input to yacc is divided into three sections.
... definitions ...
%%
... rules ...
%%
... subroutines ...
11
Hathal & Ahmad
Yacc
 The definitions section consists of:
◦ token declarations .
◦ C code bracketed by “%{“ and “%}”.
◦ the rules section consists of:
 BNF grammar .
 the subroutines section consists of:
◦ user subroutines .
12
Hathal & Ahmad
yacc& lex in Together
 The grammar:
program -> program expr | ε
expr -> expr + expr | expr - expr | id
 Program and expr are nonterminals.
 Id are terminals (tokens returned by lex) .
 expression may be :
◦ sum of two expressions .
◦ product of two expressions .
◦ Or an identifiers
13
Hathal & Ahmad
Lex file
14
Hathal & Ahmad
Yacc file
15
Hathal & Ahmad
Linking lex&yacc
16
Hathal & Ahmad
17
The Lex and Flex Scanner
Generators
 Lex and its newer cousin flex are
scanner generators
 Systematically translate regular
definitions into C source code for
efficient scanning
 Generated code is easy to integrate in
C applications
18
Creating a Lexical Analyzer with
Lex and Flex
lex or flex
compiler
lex
source
program
lex.l
lex.yy.c
input
stream
C
compiler
a.out
sequence
of tokens
lex.yy.c
a.out
19
Lex Specification
 A lex specification consists of three parts:
regular definitions, C declarations in %{
%}
%%
translation rules
%%
user-defined auxiliary procedures
 The translation rules are of the form:
p1 { action1 }
p2 { action2 }
…
pn { actionn }
20
Regular Expressions in Lex
x match the character x
. match the character .
“string” match contents of string of characters
. match any character except newline
^ match beginning of a line
$ match the end of a line
[xyz] match one character x, y, or z (use  to escape -)
[^xyz]match any character except x, y, and z
[a-z] match one of a to z
r* closure (match zero or more occurrences)
r+ positive closure (match one or more occurrences)
r? optional (match zero or one occurrence)
r1r2 match r1 then r2 (concatenation)
r1|r2 match r1 or r2 (union)
( r ) grouping
r1r2 match r1 when followed by r2
{d} match the regular expression defined by d
21
Example Lex Specification 1
%{
#include <stdio.h>
%}
%%
[0-9]+ { printf(“%sn”, yytext); }
.|n { }
%%
main()
{ yylex();
}
Contains
the matching
lexeme
Invokes
the lexical
analyzer
lex spec.l
gcc lex.yy.c -ll
./a.out < spec.l
Translation
rules
22
Example Lex Specification 2
%{
#include <stdio.h>
int ch = 0, wd = 0, nl = 0;
%}
delim [ t]+
%%
n { ch++; wd++; nl++; }
^{delim} { ch+=yyleng; }
{delim} { ch+=yyleng; wd++; }
. { ch++; }
%%
main()
{ yylex();
printf("%8d%8d%8dn", nl, wd, ch);
}
Regular
definition
Translation
rules
23
Example Lex Specification 3
%{
#include <stdio.h>
%}
digit [0-9]
letter [A-Za-z]
id {letter}({letter}|{digit})*
%%
{digit}+ { printf(“number: %sn”, yytext); }
{id} { printf(“ident: %sn”, yytext); }
. { printf(“other: %sn”, yytext); }
%%
main()
{ yylex();
}
Regular
definitions
Translation
rules
24
Example Lex Specification 4
%{ /* definitions of manifest constants */
#define LT (256)
…
%}
delim [ tn]
ws {delim}+
letter [A-Za-z]
digit [0-9]
id {letter}({letter}|{digit})*
number {digit}+(.{digit}+)?(E[+-]?{digit}+)?
%%
{ws} { }
if {return IF;}
then {return THEN;}
else {return ELSE;}
{id} {yylval = install_id(); return ID;}
{number} {yylval = install_num(); return NUMBER;}
“<“ {yylval = LT; return RELOP;}
“<=“ {yylval = LE; return RELOP;}
“=“ {yylval = EQ; return RELOP;}
“<>“ {yylval = NE; return RELOP;}
“>“ {yylval = GT; return RELOP;}
“>=“ {yylval = GE; return RELOP;}
%%
int install_id()
…
Return
token to
parser
Token
attribute
Install yytext as
identifier in symbol table
25
Design of a Lexical Analyzer
Generator
 Translate regular expressions to NFA
 Translate NFA to an efficient DFA
regular
expressions
NFA DFA
Simulate NFA
to recognize
tokens
Simulate DFA
to recognize
tokens
Optional

Lex and Yacc Tool M1.ppt

  • 1.
  • 2.
    Lex  lex isa program (generator) that generates lexical analyzers, (widely used on Unix).  It is mostly used with Yacc parser generator.  Written by Eric Schmidt and Mike Lesk.  It reads the input stream (specifying the lexical analyzer ) and outputs source code implementing the lexical analyzer in the C programming language.  Lex will read patterns (regular expressions); then produces C code for a lexical analyzer that scans for identifiers. 2 Hathal & Ahmad
  • 3.
    Lex ◦ A simplepattern: letter(letter|digit)*  Regular expressions are translated by lex to a computer program that mimics an FSA.  This pattern matches a string of characters that begins with a single letter followed by zero or more letters or digits. 3 Hathal & Ahmad
  • 4.
    Lex  Some limitations,Lex cannot be used to recognize nested structures such as parentheses, since it only has states and transitions between states.  So, Lex is good at pattern matching, while Yacc is for more challenging tasks. 4 Hathal & Ahmad
  • 5.
  • 6.
    Lex • Pattern Matchingexamples. 6 Hathal & Ahmad
  • 7.
    Lex ……..Definitions section…… %% ……Rules section…….. %% ……….Ccode section (subroutines)……..  The input structure to Lex. •Echo is an action and predefined macro in lex that writes code matched by the pattern. 7 Hathal & Ahmad
  • 8.
  • 9.
    Lex  Whitespace mustseparate the defining term and the associated expression.  Code in the definitions section is simply copied as-is to the top of the generated C file and must be bracketed with “%{“ and “%}” markers.  substitutions in the rules section are surrounded by braces ({letter}) to distinguish them from literals. 9 Hathal & Ahmad
  • 10.
    Yacc  Theory: ◦ Yaccreads the grammar and generate C code for a parser . ◦ Grammars written in Backus Naur Form (BNF) . ◦ BNF grammar used to express context-free languages . ◦ e.g. to parse an expression , do reverse operation( reducing the expression) ◦ This known as bottom-up or shift-reduce parsing . ◦ Using stack for storing (LIFO). 10 Hathal & Ahmad
  • 11.
    Yacc  Input toyacc is divided into three sections. ... definitions ... %% ... rules ... %% ... subroutines ... 11 Hathal & Ahmad
  • 12.
    Yacc  The definitionssection consists of: ◦ token declarations . ◦ C code bracketed by “%{“ and “%}”. ◦ the rules section consists of:  BNF grammar .  the subroutines section consists of: ◦ user subroutines . 12 Hathal & Ahmad
  • 13.
    yacc& lex inTogether  The grammar: program -> program expr | ε expr -> expr + expr | expr - expr | id  Program and expr are nonterminals.  Id are terminals (tokens returned by lex) .  expression may be : ◦ sum of two expressions . ◦ product of two expressions . ◦ Or an identifiers 13 Hathal & Ahmad
  • 14.
  • 15.
  • 16.
  • 17.
    17 The Lex andFlex Scanner Generators  Lex and its newer cousin flex are scanner generators  Systematically translate regular definitions into C source code for efficient scanning  Generated code is easy to integrate in C applications
  • 18.
    18 Creating a LexicalAnalyzer with Lex and Flex lex or flex compiler lex source program lex.l lex.yy.c input stream C compiler a.out sequence of tokens lex.yy.c a.out
  • 19.
    19 Lex Specification  Alex specification consists of three parts: regular definitions, C declarations in %{ %} %% translation rules %% user-defined auxiliary procedures  The translation rules are of the form: p1 { action1 } p2 { action2 } … pn { actionn }
  • 20.
    20 Regular Expressions inLex x match the character x . match the character . “string” match contents of string of characters . match any character except newline ^ match beginning of a line $ match the end of a line [xyz] match one character x, y, or z (use to escape -) [^xyz]match any character except x, y, and z [a-z] match one of a to z r* closure (match zero or more occurrences) r+ positive closure (match one or more occurrences) r? optional (match zero or one occurrence) r1r2 match r1 then r2 (concatenation) r1|r2 match r1 or r2 (union) ( r ) grouping r1r2 match r1 when followed by r2 {d} match the regular expression defined by d
  • 21.
    21 Example Lex Specification1 %{ #include <stdio.h> %} %% [0-9]+ { printf(“%sn”, yytext); } .|n { } %% main() { yylex(); } Contains the matching lexeme Invokes the lexical analyzer lex spec.l gcc lex.yy.c -ll ./a.out < spec.l Translation rules
  • 22.
    22 Example Lex Specification2 %{ #include <stdio.h> int ch = 0, wd = 0, nl = 0; %} delim [ t]+ %% n { ch++; wd++; nl++; } ^{delim} { ch+=yyleng; } {delim} { ch+=yyleng; wd++; } . { ch++; } %% main() { yylex(); printf("%8d%8d%8dn", nl, wd, ch); } Regular definition Translation rules
  • 23.
    23 Example Lex Specification3 %{ #include <stdio.h> %} digit [0-9] letter [A-Za-z] id {letter}({letter}|{digit})* %% {digit}+ { printf(“number: %sn”, yytext); } {id} { printf(“ident: %sn”, yytext); } . { printf(“other: %sn”, yytext); } %% main() { yylex(); } Regular definitions Translation rules
  • 24.
    24 Example Lex Specification4 %{ /* definitions of manifest constants */ #define LT (256) … %} delim [ tn] ws {delim}+ letter [A-Za-z] digit [0-9] id {letter}({letter}|{digit})* number {digit}+(.{digit}+)?(E[+-]?{digit}+)? %% {ws} { } if {return IF;} then {return THEN;} else {return ELSE;} {id} {yylval = install_id(); return ID;} {number} {yylval = install_num(); return NUMBER;} “<“ {yylval = LT; return RELOP;} “<=“ {yylval = LE; return RELOP;} “=“ {yylval = EQ; return RELOP;} “<>“ {yylval = NE; return RELOP;} “>“ {yylval = GT; return RELOP;} “>=“ {yylval = GE; return RELOP;} %% int install_id() … Return token to parser Token attribute Install yytext as identifier in symbol table
  • 25.
    25 Design of aLexical Analyzer Generator  Translate regular expressions to NFA  Translate NFA to an efficient DFA regular expressions NFA DFA Simulate NFA to recognize tokens Simulate DFA to recognize tokens Optional