Lex and Yacc Tool M1.ppt

Lex
 lex is a program (generator) that generates lexical
analyzers, (widely used on Unix).
 It is mostly used with Yacc parser generator.
 Written by Eric Schmidt and Mike Lesk.
 It reads the input stream (specifying the lexical analyzer )
and outputs source code implementing the lexical analyzer
in the C programming language.
 Lex will read patterns (regular expressions); then produces
C code for a lexical analyzer that scans for identifiers.
2
Hathal & Ahmad

Lex
◦ A simple pattern: letter(letter|digit)*
 Regular expressions are translated by lex to a computer program that mimics an
FSA.
 This pattern matches a string of characters that begins with a single letter
followed by zero or more letters or digits.
3
Hathal & Ahmad

Lex
 Some limitations, Lex cannot be used to recognize nested structures such as
parentheses, since it only has states and transitions between states.
 So, Lex is good at pattern matching, while Yacc is for more challenging
tasks.
4
Hathal & Ahmad

Lex
Pattern Matching Primitives
5
Hathal & Ahmad

Lex
• Pattern Matching examples.
6
Hathal & Ahmad

Lex
……..Definitions section……
%%
……Rules section……..
%%
……….C code section (subroutines)……..
 The input structure to Lex.
•Echo is an action and predefined macro in
lex that writes code matched by the
pattern.
7
Hathal & Ahmad

Lex
Lex predefined variables.
8
Hathal & Ahmad

Lex
 Whitespace must separate the defining term and the associated expression.
 Code in the definitions section is simply copied as-is to the top of the generated C file and must
be bracketed with “%{“ and “%}” markers.
 substitutions in the rules section are surrounded by braces ({letter}) to distinguish them from
literals.
9
Hathal & Ahmad

Yacc
 Theory:
◦ Yacc reads the grammar and generate C code for a parser .
◦ Grammars written in Backus Naur Form (BNF) .
◦ BNF grammar used to express context-free languages .
◦ e.g. to parse an expression , do reverse operation( reducing the
expression)
◦ This known as bottom-up or shift-reduce parsing .
◦ Using stack for storing (LIFO).
10
Hathal & Ahmad

Yacc
 Input to yacc is divided into three sections.
... definitions ...
%%
... rules ...
%%
... subroutines ...
11
Hathal & Ahmad

Yacc
 The definitions section consists of:
◦ token declarations .
◦ C code bracketed by “%{“ and “%}”.
◦ the rules section consists of:
 BNF grammar .
 the subroutines section consists of:
◦ user subroutines .
12
Hathal & Ahmad

yacc& lex in Together
 The grammar:
program -> program expr | ε
expr -> expr + expr | expr - expr | id
 Program and expr are nonterminals.
 Id are terminals (tokens returned by lex) .
 expression may be :
◦ sum of two expressions .
◦ product of two expressions .
◦ Or an identifiers
13
Hathal & Ahmad

Linking lex&yacc
16
Hathal & Ahmad

17
The Lex and Flex Scanner
Generators
 Lex and its newer cousin flex are
scanner generators
 Systematically translate regular
definitions into C source code for
efficient scanning
 Generated code is easy to integrate in
C applications

18
Creating a Lexical Analyzer with
Lex and Flex
lex or flex
compiler
lex
source
program
lex.l
lex.yy.c
input
stream
C
compiler
a.out
sequence
of tokens
lex.yy.c
a.out

19
Lex Specification
 A lex specification consists of three parts:
regular definitions, C declarations in %{
%}
%%
translation rules
%%
user-defined auxiliary procedures
 The translation rules are of the form:
p1 { action1 }
p2 { action2 }
…
pn { actionn }

20
Regular Expressions in Lex
x match the character x
. match the character .
“string” match contents of string of characters
. match any character except newline
^ match beginning of a line
$ match the end of a line
[xyz] match one character x, y, or z (use to escape -)
[^xyz]match any character except x, y, and z
[a-z] match one of a to z
r* closure (match zero or more occurrences)
r+ positive closure (match one or more occurrences)
r? optional (match zero or one occurrence)
r1r2 match r1 then r2 (concatenation)
r1|r2 match r1 or r2 (union)
( r ) grouping
r1r2 match r1 when followed by r2
{d} match the regular expression defined by d

21
Example Lex Specification 1
%{
#include <stdio.h>
%}
%%
[0-9]+ { printf(“%sn”, yytext); }
.|n { }
%%
main()
{ yylex();
}
Contains
the matching
lexeme
Invokes
the lexical
analyzer
lex spec.l
gcc lex.yy.c -ll
./a.out < spec.l
Translation
rules

22
%{
#include <stdio.h>
int ch = 0, wd = 0, nl = 0;
%}
delim [ t]+
%%
n { ch++; wd++; nl++; }
^{delim} { ch+=yyleng; }
{delim} { ch+=yyleng; wd++; }
. { ch++; }
%%
main()
{ yylex();
printf("%8d%8d%8dn", nl, wd, ch);
}
Regular
definition
Translation
rules

23
%{
#include <stdio.h>
%}
digit [0-9]
letter [A-Za-z]
id {letter}({letter}|{digit})*
%%
{digit}+ { printf(“number: %sn”, yytext); }
{id} { printf(“ident: %sn”, yytext); }
. { printf(“other: %sn”, yytext); }
%%
main()
{ yylex();
}
Regular
definitions
Translation
rules

24
%{ /* definitions of manifest constants */
#define LT (256)
…
%}
delim [ tn]
ws {delim}+
letter [A-Za-z]
digit [0-9]
id {letter}({letter}|{digit})*
number {digit}+(.{digit}+)?(E[+-]?{digit}+)?
%%
{ws} { }
if {return IF;}
then {return THEN;}
else {return ELSE;}
{id} {yylval = install_id(); return ID;}
{number} {yylval = install_num(); return NUMBER;}
“<“ {yylval = LT; return RELOP;}
“<=“ {yylval = LE; return RELOP;}
“=“ {yylval = EQ; return RELOP;}
“<>“ {yylval = NE; return RELOP;}
“>“ {yylval = GT; return RELOP;}
“>=“ {yylval = GE; return RELOP;}
%%
int install_id()
…
Return
token to
parser
Token
attribute
Install yytext as
identifier in symbol table

25
Design of a Lexical Analyzer
Generator
 Translate regular expressions to NFA
 Translate NFA to an efficient DFA
regular
expressions
NFA DFA
Simulate NFA
to recognize
tokens
Simulate DFA
to recognize
tokens
Optional

Lex and Yacc Tool M1.ppt

More Related Content

Similar to Lex and Yacc Tool M1.ppt

Recently uploaded

Lex and Yacc Tool M1.ppt