Compiler Design_LEX Tool for Lexical Analysis.pptx

LEX Tool
by
Dr. R. A. Deshmukh

LEX TOOL
 A tool widely used to specify lexical
analyzers for a variety of languages or the
breaking up of an input stream into
meaningful units, or tokens.
 For example, consider breaking a text file
up into individual words.
We refer to the tool as Lex compiler and to
its input specification as the Lex language.

LEX TOOL Lexical Analyzer Generator
Lexical
Compiler
Lex
Source
program
lex.l
lex.yy.c
C
compiler
lex.yy.
c
a.out
a.out
Input
stream
Sequence
of tokens

Structure of Lex programs
declarations
%%
translation rules
%%
auxiliary
functions
Pattern
{Action}

Internal Structure of Lex
Lex
Regular
expressions
NFA DFA
Minimal
DFA
The final states of the DFA are
associated with actions

Declaration
The declarations section
includes declarations of
variables,manifest constants(A
manifest constant is an
identifier that is declared to
represent a constant e.g.
# define PIE 3.14), the files to be
included. (written in %{ %} )
 and definitions of regular

Translation rules
The translation rules of a Lex program are
statements of the form :
p1 {action 1}
p2 {action 2}
p3 {action 3}
… …
where each p is a regular expression and
each action is a program fragment
describing what action the lexical
analyzer should take when a string
pattern matches with p matches a lexeme.
In Lex the actions are written in C.

Auxiliary procedures
The third section holds whatever
auxiliary procedures are needed by the
actions.
Alternatively these procedures can be
compiled separately and loaded with
the lexical analyzer. The auxiliary
procedures are written in C language.

How does this Lexical analyzer work?
The lexical analyzer created by Lex behaves
in concert with a parser in the following
manner.
When activated by the parser, the lexical
analyzer begins reading its remaining
input , one character at a time, until it has
found the longest prefix of the input that is
matched by one of the regular expressions p
contd…

How does this Lexical analyzer work?
then it executes the corresponding action. Typically
the action will return control to the parser. However, if
it does not, then the lexical analyzer proceeds to find
more lexemes, until an action causes control to
return to the parser. The repeated search for lexemes
until an explicit return allows the lexical analyzer to
process white space and comments conveniently.
 The lexical analyzer returns a single quantity,
the token, to the parser. To pass an attribute
value with information about the lexeme, we
can set the global variable yylval.
contd…

 e.g. Suppose the lexical analyzer
returns a single token for all the
relational operators, in which case
the parser won’t be able to
distinguish between
”<=”, ”>=”, ”<”, ”>”, ”==” etc. We
can set yylval appropriately to
specify the nature of the operator.

The two variables yytext and yyleng
 Lex makes the lexeme available to the
routines appearing in the third
section through two variables yytext
and yyleng
1) yytext is a variable that is a pointer
to the first character of the lexeme.
2)yyleng is an integer telling how long
the lexeme is.

A lexeme may match more than one
patterns. How is this problem resolved?
 Take for example the lexeme if. It matches the
patterns for both keyword if and identifier. If the
pattern for keyword if precedes the pattern for
identifier in the declaration list of the lex program
the conflict is resolved in favour of the keyword.
In general this ambiguity-resolving strategy makes
it easy to reserve keywords by listing them ahead of
the pattern for identifiers.
The Lex’s strategy of selecting the longest
prefix matched by a pattern makes it easy to
resolve other conflicts like the one between “<” and
“<=”.

Regular Expression Basics
 . : matches any single character except n
 * : matches 0 or more instances of the preceding regular
expression
 + : matches 1 or more instances of the preceding regular
expression
 ? : matches 0 or 1 instance of the preceding regular
expression
 | : matches the preceding or following regular
expression
 [ ] : defines a character class
 () : groups enclosed regular expression into a new
regular expression
 “…”: matches everything within the “ “ literally

Regular Expression Basics
x|y : x or y
{i} : definition of i
x/y : x, only if followed by y (y not removed
from input)
x{m,n} : m to n occurrences of x
 x :x, but only at beginning of line
x$ : x, but only at end of line
“s” : exactly what is in the quotes
A regular expression finishes with a space,
tab or
newline.

Pattern Matching Examples
Expression Matches
abc* ab abc abcc abccc ...
abc+ abc abcc abccc ...
a(bc)+ abc abcbc abcbcbc ...
a(bc)? a abc
[abc] one of: a, b, c
[a-z] any letter, a-z
[a-z] one of: a, -, z
[-az] one of: -, a, z
[A-Za-z0-9]+ one or more alphanumeric
characters
[ tn]+ whitespace
[^ab] anything except: a, b
[a^b] one of: a, ^, b

Lex Predefined Variables
Name Function
int yylex(void) call to invoke lexer, returns
token
char *yytext pointer to matched string
yyleng length of matched string
yylval value associated with token
int yywrap(void) wrapup, return 1 if done, 0 if
not done
FILE *yyout output file
FILE *yyin input file
ECHO write matched string

Usage
 To run Lex on a source file, type
lex scanner.l
 It produces a file named lex.yy.c which is a C
program for the lexical analyzer.
 To compile lex.yy.c, type
gcc lex.yy.c
 To run the lexical analyzer program, type
1) ./a.out <inputfile
2) ./a.out
3) ./a.out <inputfile >outputfile

Program to count no. of words
%{
int counter = 0;
%}
letter [a-zA-Z]
%%
{letter}+ {printf(“a wordn”); counter++;}
%%
main() {
yylex();
printf(“There are total %d wordsn”, counter);
}
int yywrap(void) {
return 1;
}
[root@localhost ~]# lex a1.l
[root@localhost ~]# gcc
lex.yy.c
[root@localhost ~]# ./a.out
pune
a word
bombay
a word

Program to count no. of identifiers
digit [0-9]
letter [a-zA-Z]
%{
# include<stdio.h>
%}
%%
{letter}({letter}|{digit})* {printf(“id: %sn”, yytext);}
n{printf(“new linen”);}
%%
main() {
yylex();
}
int yywrap(void) {
return 1;
}

[root@localhost ~]# gcc -o a3
lex.yy.c
[root@localhost ~]# ./a3
abc ghg jkhjk
id: abc
id: ghg
id: jkhjk
new line
[root@localhost ~]# ./a3 <a.txt
id: Today
id: is
id: Friday
new line
id: Environment
id: is
id: good
new line
[root@localhost ~]# cat
a.txt
Today is Friday
Environment is good

Program to count no. of characters,
words and lines in a program
%{
int nchar=0, nword=0, nline=0;
%}
%%
n { nline++; nchar++; }
[^ tn]+ { nword++, nchar += yyleng; }
. { nchar++; }
%%
int main(void) {
yylex();
printf("%dt%dt%dn", nchar, nword, nline);
return 0;
}

[root@localhost ~]# gcc -o a2
lex.yy.c
[root@localhost ~]# ./a2
hello good morning
how r u
28 6 2
[root@localhost ~]#

Program to count no. of lines, tabs, characters,spaces
and words in a program
%{
# include<stdio.h>
int lc=0,tc=0,sc=0,wc=0,cc=0;
%}
%%
[^ tn]+ {cc+=yyleng; wc++;}
[ ] {cc++;sc++;}
[t] {cc++; tc++;}
n {cc++; lc++;}
%%
main()
{
yyin = fopen("a1.txt","r");
yyout=fopen("a2.txt","w");
yylex();
fclose(yyin);
fclose(yyout);
}
int yywrap()
{
fprintf(yyout,"no. of lines =%dn",lc);
fprintf(yyout,"no. of tabs=%dn",tc);
fprintf(yyout,"no. of characters = %d
n",cc);
fprintf(yyout,"no. of spaces = %d
n",sc);
fprintf(yyout,"no. of words=%dn",wc);
return 1;
}

[root@localhost ~]# lex l1.l
[root@localhost ~]# gcc -o l1 lex.yy.c
[root@localhost ~]# ./l1
[root@localhost ~]# cat a1.txt
Today is Friday
Environment is good
[root@localhost ~]# cat a2.txt
no. of lines =2
no. of tabs=4
no. of characters = 39
no. of spaces = 3
no. of words=6
[root@localhost ~]#

%{
# include<stdio.h>
int pcount=0,ncount=0;
%}
pos_num [+]?[0-9]+(.[0-9]+)?([eE][-+]?[0-9]+)?
neg_num [-][0-9]+(.[0-9]+)?([eE][-+]?[0-9]+)?
%%
{pos_num} {printf("positive number = %sn",yytext);pcount+
+;}
{neg_num} {printf("negative number = %sn",yytext);ncount+
+;}
.|n {}
%%
main()
{
yylex();
}
int yywrap()
{
printf("no. of positive numbers = %dn",pcount);
printf("no. of negative numbers = %dn",ncount);
Program to count positive and negative numbers in a program

[root@localhost ~]# lex num.l
[root@localhost ~]# gcc -o num lex.yy.c
[root@localhost ~]# ./num
23 -90
positive number = 23
negative number = -90
90.78E+4
positive number = 90.78E+4
no. of positive numbers = 2
no. of negative numbers = 1
[root@localhost ~]#

%{
# include<stdio.h>
# include<string.h>
struct symtab
{
char *name;
};
struct symtab sym[10],*k;
struct symtab *install_id(char
*s);
void disp();
%}

L [a-zA-Z]
D [0-9]
id {L}({L}|{D})*
num {D}+(.{D}+)?([eE][-+]?{D}+)?
bop [-+*/=]
uop "++"|"--"
relop "<"|"<="|">"|">="|"!="|"=="
lop "&&"|"||"
bitlop [&|!]
kew
“if"|"else"|"while"|"do"|"for"|"int"|"char"|"float"
pun [,;'"[]{})(]
comment "/*"(.|n)*"*/"|"//"(.)*
ws [ tn]+
st "(.)*"

%%
{ws} { }
{kew} {printf("keyword =%sn",yytext);}
{id} {k=install_id(yytext);printf("identifier =%sn",yytext);}
{num} {printf("constant =%sn",yytext);}
{bop} {printf("binary op =%sn",yytext);}
{uop} {printf("unary op =%sn",yytext);}
{relop} {printf("relational op=%sn",yytext);}
{lop} {printf("logical op =%sn",yytext);}
{pun} {printf("punct =%sn",yytext);}
{bitlop} {printf("bitwise logical op=%sn",yytext);}
{comment} {printf("comment =%sn",yytext);}
{st} {printf("string =%sn",yytext);}
%%

main()
{
yylex();
disp();
}
struct symtab *install_id(char *s)
{
struct symtab *p;
//printf("in symbol tablen");
for(p=&sym[0];p<&sym[10];p++)
{
if(p->name && !strcmp(s,p->name))
return p;
if(!p->name)
{
p->name=strdup(s);
return p;
}
}
}
void disp()
{
struct symtab *p;
printf("symbol tablen");
for(p=&sym[0];p<&sym[10];p++)
{
if(p->name)
printf("%sn",p->name);
}
}
int yywrap()
{
return 1;
}

[root@localhost ~]# lex la.l
[root@localhost ~]# gcc -o la lex.yy.c
[root@localhost ~]# ./la <b.txt
identifier =main
punct =(
punct =)
punct ={
keyword =int
identifier =a
punct =,
identifier =b
punct =;
identifier =a
binary op ==
identifier =b
binary op =+
identifier =c
punct =;
keyword =if
punct =(
identifier =a
relational op=<=
identifier =m
punct =)
identifier =main
punct =(
punct =)
punct ={
keyword =int
identifier =a
punct =,
identifier =b
punct =;
identifier =a
binary op ==
identifier =b
binary op =+
identifier =c
punct =;
keyword =if
punct =(
identifier =a
relational op=<=
identifier =m
punct =)
identifier =main
punct =(
punct =)
punct ={
keyword =int
identifier =a
punct =,
identifier =b
punct =;
identifier =a
binary op ==
identifier =b
binary op =+
identifier =c
punct =;
keyword =if
punct =(
identifier =a
relational op=<=
identifier =m
punct =)

identifier =a
unary op =++
punct =;
comment =/*dfdfdf*/
punct =}
symbol table
main
a
b
c
m
[root@localhost ~]# cat b.txt
main()
{
int a,b;
a=b+c;
if(a<=m)
a++;
/*dfdfdf*/
}
[root@localhost ~]#

Compiler Design_LEX Tool for Lexical Analysis.pptx

More Related Content

Similar to Compiler Design_LEX Tool for Lexical Analysis.pptx

More from RushaliDeshmukh2

Recently uploaded

Compiler Design_LEX Tool for Lexical Analysis.pptx