1. System Programming
Walchand Institute of Technology
Aim:
Scanner using LEX
Theory:
Language processor development tools (LPDTs) focusing on generation of the
analysis phase of language processors.
Figure 1 shows a schematic
language processor whose source
two inputs:
1. Specification
2. Specification
phase.
Figure 1: A Language Processor Development Tool
It generates programs that perform lexical, syntax and semantic analysis
source program and construct the IR. These programs collectively form the
analysis phase of the language processor.
Sunita M. Dol, CSE Dept
Walchand Institute of Technology, Solapur
HANDOUT#02
guage processor development tools (LPDTs) focusing on generation of the
analysis phase of language processors.
shows a schematic of an LPDT which generates the analysis phase
whose source language is L. The LPDT requires the following
Specification of a grammar of language L
Specification of semantic actions to be performed in the analysis
: A Language Processor Development Tool
It generates programs that perform lexical, syntax and semantic analysis
source program and construct the IR. These programs collectively form the
language processor.
Sunita M. Dol, CSE Dept
Page 1
guage processor development tools (LPDTs) focusing on generation of the
an LPDT which generates the analysis phase of a
is L. The LPDT requires the following
semantic actions to be performed in the analysis
: A Language Processor Development Tool (LPDT)
It generates programs that perform lexical, syntax and semantic analysis of the
source program and construct the IR. These programs collectively form the
2. System Programming
Walchand Institute of Technology
Two LPDTs are widely used in practice. These are, th
LEX, and the parser generator YACC. The input to these tools is a specification
the lexical and syntactic constructs
on recognizing the constructs. The specification consists
rules of the form
<
where < semantic action >
matching < string specification
erate C programs which contain the code for scanning and parsing, respectively,
and the semantic actions contained in the specification.
A YACC generated parser can use a LEX generated scanner as a routine if the
scanner and parser use same con
Figure 2 shows a schematic for developing the analysis phase
language L using LEX and YACC. The analysis phase processes the source
program to build an intermediate represen
Figure
LEX
LEX accepts an input specification which consists
component is a specification
and constants. This specification is in the form
component is a specification
consists of a set of tables
Sunita M. Dol, CSE Dept
Walchand Institute of Technology, Solapur
Two LPDTs are widely used in practice. These are, the lexical analyzer generator
LEX, and the parser generator YACC. The input to these tools is a specification
the lexical and syntactic constructs of L, and the semantic actions to be performed
on recognizing the constructs. The specification consists of a set
<string specification> {< semantic action>}
> consists of C code. This code is executed when a string
string specification > is encountered in the input. LEX and YACC gen
erate C programs which contain the code for scanning and parsing, respectively,
and the semantic actions contained in the specification.
A YACC generated parser can use a LEX generated scanner as a routine if the
scanner and parser use same conventions concerning the representation
shows a schematic for developing the analysis phase
L using LEX and YACC. The analysis phase processes the source
program to build an intermediate representation.
Figure 2: using LEX and YACC
LEX accepts an input specification which consists of two components. The first
component is a specification of strings representing the lexical units in L, e.g. id’s
and constants. This specification is in the form of regular expres
component is a specification of semantic actions aimed at building an IR. The IR
tables of lexical units and a sequence of tokens for the lexical
Sunita M. Dol, CSE Dept
Page 2
e lexical analyzer generator
LEX, and the parser generator YACC. The input to these tools is a specification of
L, and the semantic actions to be performed
a set of translation
string specification> {< semantic action>}
C code. This code is executed when a string
> is encountered in the input. LEX and YACC gen-
erate C programs which contain the code for scanning and parsing, respectively,
A YACC generated parser can use a LEX generated scanner as a routine if the
oncerning the representation of tokens.
shows a schematic for developing the analysis phase of a compiler for
L using LEX and YACC. The analysis phase processes the source
two components. The first
strings representing the lexical units in L, e.g. id’s
regular expressions. The second
semantic actions aimed at building an IR. The IR
tokens for the lexical
3. System Programming
Walchand Institute of Technology
units occurring in a source statement. Accordingly, the semantic action
entries in the tables and build tokens for the lexical units.
Example
Figure 3 shows a sample input to LEX. The input consists
three of which are shown here. The first component (enclosed
defines the symbols used in specifying the strings
to stand for any upper or lower case letter, and digit to stand for any digit. The sec
ond component enclosed between %% and %% contains the translation rules. The
third component contains aux
actions.
Figure
The sample input in above
assignment operator), and identifier and constant strings
found, it is entered in the symbol table (if not already present) using the routine
enter.id. The pair (ID, entry #)
convention entry # is put in the global variable yylval, and the cla
returned as the value of the call on scanner. Similar actions are taken on finding a
constant, the keywords begin and end and the assignment operator.
Compiling the Lexical Analyzer
To compile a lex program, do the following:
Sunita M. Dol, CSE Dept
Walchand Institute of Technology, Solapur
units occurring in a source statement. Accordingly, the semantic action
entries in the tables and build tokens for the lexical units.
shows a sample input to LEX. The input consists of
which are shown here. The first component (enclosed
ed in specifying the strings of L. It defines the symbol letter
to stand for any upper or lower case letter, and digit to stand for any digit. The sec
ond component enclosed between %% and %% contains the translation rules. The
third component contains auxiliary routines which can be used in the semantic
Figure 3: A sample LEX specification
above Figure 3 defines the strings begin, end, := (the
assignment operator), and identifier and constant strings of L. When an identifier is
found, it is entered in the symbol table (if not already present) using the routine
entry #) forms the token for the identifier string.
is put in the global variable yylval, and the cla
the call on scanner. Similar actions are taken on finding a
constant, the keywords begin and end and the assignment operator.
Compiling the Lexical Analyzer
program, do the following:
Sunita M. Dol, CSE Dept
Page 3
units occurring in a source statement. Accordingly, the semantic actions make new
of four components,
which are shown here. The first component (enclosed by %{ and %})
L. It defines the symbol letter
to stand for any upper or lower case letter, and digit to stand for any digit. The sec-
ond component enclosed between %% and %% contains the translation rules. The
iliary routines which can be used in the semantic
: A sample LEX specification
defines the strings begin, end, := (the
L. When an identifier is
found, it is entered in the symbol table (if not already present) using the routine
forms the token for the identifier string. By
is put in the global variable yylval, and the class code ID is
the call on scanner. Similar actions are taken on finding a
constant, the keywords begin and end and the assignment operator.
4. System Programming Sunita M. Dol, CSE Dept
Walchand Institute of Technology, Solapur Page 4
1. Use the lex program to change the specification file into a C language
program. The resulting program is in the lex.yy.c file.
2. Use the cc command with the -ll flag to compile and link the program with a
library of lex subroutines. The resulting executable program is in the a.out
file.
For example, if the lex specification file is called lextest, enter the following
commands:
lex lextest
cc lex.yy.c -ll
Program:
%{
%}
delim [tn]
ws delim+
letter [A-Za-z]
digit [1-9]
id {letter}({letter}|{digit})*
number {digit}+(.{digit}+)?(E[+/-]?{digit}+)?
%%
{ws} {/*printf("white space");*/}
if {printf("%s : Keywordn",yytext);}
then {printf("%s : Keywordn",yytext);}
else {printf("%s : Keywordn",yytext);}
{id} {printf("Identifier");}
{number} {printf("Number");}
"<"|"<="|"=="|"<>"|">"|">=" {printf("Relational operator");}
%%
main()
{
yylex();
}
5. System Programming Sunita M. Dol, CSE Dept
Walchand Institute of Technology, Solapur Page 5
Input:
if
Output:
Keyword
Conclusion:
The lex command generates a C language program that can analyze an input
stream using information in the specification file.
The lex command stores the output program in a lex.yy.c file.
If the output program recognizes a simple, one-word input structure, the
lex.yy.c output file can compile to produce an executable lexical analyzer.
Thus lexical analyzer generator or scanner is implemented using LEX