Compiler Design_LEX Tool for Lexical Analysis.pptx
A tool widely used to specify lexical analyzers for a variety of languages or the breaking up of an input stream into meaningful units, or tokens.
For example, consider breaking a text file up into individual words.
LEX TOOL
Atool widely used to specify lexical
analyzers for a variety of languages or the
breaking up of an input stream into
meaningful units, or tokens.
For example, consider breaking a text file
up into individual words.
We refer to the tool as Lex compiler and to
its input specification as the Lex language.
3.
LEX TOOL LexicalAnalyzer Generator
Lexical
Compiler
Lex
Source
program
lex.l
lex.yy.c
C
compiler
lex.yy.
c
a.out
a.out
Input
stream
Sequence
of tokens
Internal Structure ofLex
Lex
Regular
expressions
NFA DFA
Minimal
DFA
The final states of the DFA are
associated with actions
6.
Declaration
The declarations section
includesdeclarations of
variables,manifest constants(A
manifest constant is an
identifier that is declared to
represent a constant e.g.
# define PIE 3.14), the files to be
included. (written in %{ %} )
and definitions of regular
7.
Translation rules
The translationrules of a Lex program are
statements of the form :
p1 {action 1}
p2 {action 2}
p3 {action 3}
… …
where each p is a regular expression and
each action is a program fragment
describing what action the lexical
analyzer should take when a string
pattern matches with p matches a lexeme.
In Lex the actions are written in C.
8.
Auxiliary procedures
The thirdsection holds whatever
auxiliary procedures are needed by the
actions.
Alternatively these procedures can be
compiled separately and loaded with
the lexical analyzer. The auxiliary
procedures are written in C language.
9.
How does thisLexical analyzer work?
The lexical analyzer created by Lex behaves
in concert with a parser in the following
manner.
When activated by the parser, the lexical
analyzer begins reading its remaining
input , one character at a time, until it has
found the longest prefix of the input that is
matched by one of the regular expressions p
contd…
10.
How does thisLexical analyzer work?
then it executes the corresponding action. Typically
the action will return control to the parser. However, if
it does not, then the lexical analyzer proceeds to find
more lexemes, until an action causes control to
return to the parser. The repeated search for lexemes
until an explicit return allows the lexical analyzer to
process white space and comments conveniently.
The lexical analyzer returns a single quantity,
the token, to the parser. To pass an attribute
value with information about the lexeme, we
can set the global variable yylval.
contd…
11.
e.g. Supposethe lexical analyzer
returns a single token for all the
relational operators, in which case
the parser won’t be able to
distinguish between
”<=”, ”>=”, ”<”, ”>”, ”==” etc. We
can set yylval appropriately to
specify the nature of the operator.
12.
The two variablesyytext and yyleng
Lex makes the lexeme available to the
routines appearing in the third
section through two variables yytext
and yyleng
1) yytext is a variable that is a pointer
to the first character of the lexeme.
2)yyleng is an integer telling how long
the lexeme is.
13.
A lexeme maymatch more than one
patterns. How is this problem resolved?
Take for example the lexeme if. It matches the
patterns for both keyword if and identifier. If the
pattern for keyword if precedes the pattern for
identifier in the declaration list of the lex program
the conflict is resolved in favour of the keyword.
In general this ambiguity-resolving strategy makes
it easy to reserve keywords by listing them ahead of
the pattern for identifiers.
The Lex’s strategy of selecting the longest
prefix matched by a pattern makes it easy to
resolve other conflicts like the one between “<” and
“<=”.
14.
Regular Expression Basics
. : matches any single character except n
* : matches 0 or more instances of the preceding regular
expression
+ : matches 1 or more instances of the preceding regular
expression
? : matches 0 or 1 instance of the preceding regular
expression
| : matches the preceding or following regular
expression
[ ] : defines a character class
() : groups enclosed regular expression into a new
regular expression
“…”: matches everything within the “ “ literally
15.
Regular Expression Basics
x|y: x or y
{i} : definition of i
x/y : x, only if followed by y (y not removed
from input)
x{m,n} : m to n occurrences of x
x :x, but only at beginning of line
x$ : x, but only at end of line
“s” : exactly what is in the quotes
A regular expression finishes with a space,
tab or
newline.
16.
Pattern Matching Examples
ExpressionMatches
abc* ab abc abcc abccc ...
abc+ abc abcc abccc ...
a(bc)+ abc abcbc abcbcbc ...
a(bc)? a abc
[abc] one of: a, b, c
[a-z] any letter, a-z
[a-z] one of: a, -, z
[-az] one of: -, a, z
[A-Za-z0-9]+ one or more alphanumeric
characters
[ tn]+ whitespace
[^ab] anything except: a, b
[a^b] one of: a, ^, b
17.
Lex Predefined Variables
NameFunction
int yylex(void) call to invoke lexer, returns
token
char *yytext pointer to matched string
yyleng length of matched string
yylval value associated with token
int yywrap(void) wrapup, return 1 if done, 0 if
not done
FILE *yyout output file
FILE *yyin input file
ECHO write matched string
18.
Usage
To runLex on a source file, type
lex scanner.l
It produces a file named lex.yy.c which is a C
program for the lexical analyzer.
To compile lex.yy.c, type
gcc lex.yy.c
To run the lexical analyzer program, type
1) ./a.out <inputfile
2) ./a.out
3) ./a.out <inputfile >outputfile
19.
Program to countno. of words
%{
int counter = 0;
%}
letter [a-zA-Z]
%%
{letter}+ {printf(“a wordn”); counter++;}
%%
main() {
yylex();
printf(“There are total %d wordsn”, counter);
}
int yywrap(void) {
return 1;
}
[root@localhost ~]# lex a1.l
[root@localhost ~]# gcc
lex.yy.c
[root@localhost ~]# ./a.out
pune
a word
bombay
a word
20.
Program to countno. of identifiers
digit [0-9]
letter [a-zA-Z]
%{
# include<stdio.h>
%}
%%
{letter}({letter}|{digit})* {printf(“id: %sn”, yytext);}
n{printf(“new linen”);}
%%
main() {
yylex();
}
int yywrap(void) {
return 1;
}
21.
[root@localhost ~]# lexa3.l
[root@localhost ~]# gcc -o a3
lex.yy.c
[root@localhost ~]# ./a3
abc ghg jkhjk
id: abc
id: ghg
id: jkhjk
new line
[root@localhost ~]# ./a3 <a.txt
id: Today
id: is
id: Friday
new line
id: Environment
id: is
id: good
new line
[root@localhost ~]# cat
a.txt
Today is Friday
Environment is good
22.
Program to countno. of characters,
words and lines in a program
%{
int nchar=0, nword=0, nline=0;
%}
%%
n { nline++; nchar++; }
[^ tn]+ { nword++, nchar += yyleng; }
. { nchar++; }
%%
int main(void) {
yylex();
printf("%dt%dt%dn", nchar, nword, nline);
return 0;
}
23.
[root@localhost ~]# lexa2.l
[root@localhost ~]# gcc -o a2
lex.yy.c
[root@localhost ~]# ./a2
hello good morning
how r u
28 6 2
[root@localhost ~]#
24.
Program to countno. of lines, tabs, characters,spaces
and words in a program
%{
# include<stdio.h>
int lc=0,tc=0,sc=0,wc=0,cc=0;
%}
%%
[^ tn]+ {cc+=yyleng; wc++;}
[ ] {cc++;sc++;}
[t] {cc++; tc++;}
n {cc++; lc++;}
%%
main()
{
yyin = fopen("a1.txt","r");
yyout=fopen("a2.txt","w");
yylex();
fclose(yyin);
fclose(yyout);
}
int yywrap()
{
fprintf(yyout,"no. of lines =%dn",lc);
fprintf(yyout,"no. of tabs=%dn",tc);
fprintf(yyout,"no. of characters = %d
n",cc);
fprintf(yyout,"no. of spaces = %d
n",sc);
fprintf(yyout,"no. of words=%dn",wc);
return 1;
}
25.
[root@localhost ~]# lexl1.l
[root@localhost ~]# gcc -o l1 lex.yy.c
[root@localhost ~]# ./l1
[root@localhost ~]# cat a1.txt
Today is Friday
Environment is good
[root@localhost ~]# cat a2.txt
no. of lines =2
no. of tabs=4
no. of characters = 39
no. of spaces = 3
no. of words=6
[root@localhost ~]#
26.
%{
# include<stdio.h>
int pcount=0,ncount=0;
%}
pos_num[+]?[0-9]+(.[0-9]+)?([eE][-+]?[0-9]+)?
neg_num [-][0-9]+(.[0-9]+)?([eE][-+]?[0-9]+)?
%%
{pos_num} {printf("positive number = %sn",yytext);pcount+
+;}
{neg_num} {printf("negative number = %sn",yytext);ncount+
+;}
.|n {}
%%
main()
{
yylex();
}
int yywrap()
{
printf("no. of positive numbers = %dn",pcount);
printf("no. of negative numbers = %dn",ncount);
Program to count positive and negative numbers in a program
27.
[root@localhost ~]# lexnum.l
[root@localhost ~]# gcc -o num lex.yy.c
[root@localhost ~]# ./num
23 -90
positive number = 23
negative number = -90
90.78E+4
positive number = 90.78E+4
no. of positive numbers = 2
no. of negative numbers = 1
[root@localhost ~]#
[root@localhost ~]# lexla.l
[root@localhost ~]# gcc -o la lex.yy.c
[root@localhost ~]# ./la <b.txt
identifier =main
punct =(
punct =)
punct ={
keyword =int
identifier =a
punct =,
identifier =b
punct =;
identifier =a
binary op ==
identifier =b
binary op =+
identifier =c
punct =;
keyword =if
punct =(
identifier =a
relational op=<=
identifier =m
punct =)
[root@localhost ~]# lex la.l
[root@localhost ~]# gcc -o la lex.yy.c
[root@localhost ~]# ./la <b.txt
identifier =main
punct =(
punct =)
punct ={
keyword =int
identifier =a
punct =,
identifier =b
punct =;
identifier =a
binary op ==
identifier =b
binary op =+
identifier =c
punct =;
keyword =if
punct =(
identifier =a
relational op=<=
identifier =m
punct =)
[root@localhost ~]# lex la.l
[root@localhost ~]# gcc -o la lex.yy.c
[root@localhost ~]# ./la <b.txt
identifier =main
punct =(
punct =)
punct ={
keyword =int
identifier =a
punct =,
identifier =b
punct =;
identifier =a
binary op ==
identifier =b
binary op =+
identifier =c
punct =;
keyword =if
punct =(
identifier =a
relational op=<=
identifier =m
punct =)
33.
identifier =a
unary op=++
punct =;
comment =/*dfdfdf*/
punct =}
symbol table
main
a
b
c
m
[root@localhost ~]# cat b.txt
main()
{
int a,b;
a=b+c;
if(a<=m)
a++;
/*dfdfdf*/
}
[root@localhost ~]#