System Programming
Walchand Institute of Technology
Aim:
Scanner using LEX
Theory:
Language processor development tools (LPDTs) focusing on generation of the
analysis phase of language processors.
Figure 1 shows a schematic
language processor whose source
two inputs:
1. Specification
2. Specification
phase.
Figure 1: A Language Processor Development Tool
It generates programs that perform lexical, syntax and semantic analysis
source program and construct the IR. These programs collectively form the
analysis phase of the language processor.
Sunita M. Dol, CSE Dept
Walchand Institute of Technology, Solapur
HANDOUT#02
guage processor development tools (LPDTs) focusing on generation of the
analysis phase of language processors.
shows a schematic of an LPDT which generates the analysis phase
whose source language is L. The LPDT requires the following
Specification of a grammar of language L
Specification of semantic actions to be performed in the analysis
: A Language Processor Development Tool
It generates programs that perform lexical, syntax and semantic analysis
source program and construct the IR. These programs collectively form the
language processor.
Sunita M. Dol, CSE Dept
Page 1
guage processor development tools (LPDTs) focusing on generation of the
an LPDT which generates the analysis phase of a
is L. The LPDT requires the following
semantic actions to be performed in the analysis
: A Language Processor Development Tool (LPDT)
It generates programs that perform lexical, syntax and semantic analysis of the
source program and construct the IR. These programs collectively form the
System Programming
Walchand Institute of Technology
Two LPDTs are widely used in practice. These are, th
LEX, and the parser generator YACC. The input to these tools is a specification
the lexical and syntactic constructs
on recognizing the constructs. The specification consists
rules of the form
<
where < semantic action >
matching < string specification
erate C programs which contain the code for scanning and parsing, respectively,
and the semantic actions contained in the specification.
A YACC generated parser can use a LEX generated scanner as a routine if the
scanner and parser use same con
Figure 2 shows a schematic for developing the analysis phase
language L using LEX and YACC. The analysis phase processes the source
program to build an intermediate represen
Figure
LEX
LEX accepts an input specification which consists
component is a specification
and constants. This specification is in the form
component is a specification
consists of a set of tables
Sunita M. Dol, CSE Dept
Walchand Institute of Technology, Solapur
Two LPDTs are widely used in practice. These are, the lexical analyzer generator
LEX, and the parser generator YACC. The input to these tools is a specification
the lexical and syntactic constructs of L, and the semantic actions to be performed
on recognizing the constructs. The specification consists of a set
<string specification> {< semantic action>}
> consists of C code. This code is executed when a string
string specification > is encountered in the input. LEX and YACC gen
erate C programs which contain the code for scanning and parsing, respectively,
and the semantic actions contained in the specification.
A YACC generated parser can use a LEX generated scanner as a routine if the
scanner and parser use same conventions concerning the representation
shows a schematic for developing the analysis phase
L using LEX and YACC. The analysis phase processes the source
program to build an intermediate representation.
Figure 2: using LEX and YACC
LEX accepts an input specification which consists of two components. The first
component is a specification of strings representing the lexical units in L, e.g. id’s
and constants. This specification is in the form of regular expres
component is a specification of semantic actions aimed at building an IR. The IR
tables of lexical units and a sequence of tokens for the lexical
Sunita M. Dol, CSE Dept
Page 2
e lexical analyzer generator
LEX, and the parser generator YACC. The input to these tools is a specification of
L, and the semantic actions to be performed
a set of translation
string specification> {< semantic action>}
C code. This code is executed when a string
> is encountered in the input. LEX and YACC gen-
erate C programs which contain the code for scanning and parsing, respectively,
A YACC generated parser can use a LEX generated scanner as a routine if the
oncerning the representation of tokens.
shows a schematic for developing the analysis phase of a compiler for
L using LEX and YACC. The analysis phase processes the source
two components. The first
strings representing the lexical units in L, e.g. id’s
regular expressions. The second
semantic actions aimed at building an IR. The IR
tokens for the lexical
System Programming
Walchand Institute of Technology
units occurring in a source statement. Accordingly, the semantic action
entries in the tables and build tokens for the lexical units.
Example
Figure 3 shows a sample input to LEX. The input consists
three of which are shown here. The first component (enclosed
defines the symbols used in specifying the strings
to stand for any upper or lower case letter, and digit to stand for any digit. The sec
ond component enclosed between %% and %% contains the translation rules. The
third component contains aux
actions.
Figure
The sample input in above
assignment operator), and identifier and constant strings
found, it is entered in the symbol table (if not already present) using the routine
enter.id. The pair (ID, entry #)
convention entry # is put in the global variable yylval, and the cla
returned as the value of the call on scanner. Similar actions are taken on finding a
constant, the keywords begin and end and the assignment operator.
Compiling the Lexical Analyzer
To compile a lex program, do the following:
Sunita M. Dol, CSE Dept
Walchand Institute of Technology, Solapur
units occurring in a source statement. Accordingly, the semantic action
entries in the tables and build tokens for the lexical units.
shows a sample input to LEX. The input consists of
which are shown here. The first component (enclosed
ed in specifying the strings of L. It defines the symbol letter
to stand for any upper or lower case letter, and digit to stand for any digit. The sec
ond component enclosed between %% and %% contains the translation rules. The
third component contains auxiliary routines which can be used in the semantic
Figure 3: A sample LEX specification
above Figure 3 defines the strings begin, end, := (the
assignment operator), and identifier and constant strings of L. When an identifier is
found, it is entered in the symbol table (if not already present) using the routine
entry #) forms the token for the identifier string.
is put in the global variable yylval, and the cla
the call on scanner. Similar actions are taken on finding a
constant, the keywords begin and end and the assignment operator.
Compiling the Lexical Analyzer
program, do the following:
Sunita M. Dol, CSE Dept
Page 3
units occurring in a source statement. Accordingly, the semantic actions make new
of four components,
which are shown here. The first component (enclosed by %{ and %})
L. It defines the symbol letter
to stand for any upper or lower case letter, and digit to stand for any digit. The sec-
ond component enclosed between %% and %% contains the translation rules. The
iliary routines which can be used in the semantic
: A sample LEX specification
defines the strings begin, end, := (the
L. When an identifier is
found, it is entered in the symbol table (if not already present) using the routine
forms the token for the identifier string. By
is put in the global variable yylval, and the class code ID is
the call on scanner. Similar actions are taken on finding a
constant, the keywords begin and end and the assignment operator.
System Programming Sunita M. Dol, CSE Dept
Walchand Institute of Technology, Solapur Page 4
1. Use the lex program to change the specification file into a C language
program. The resulting program is in the lex.yy.c file.
2. Use the cc command with the -ll flag to compile and link the program with a
library of lex subroutines. The resulting executable program is in the a.out
file.
For example, if the lex specification file is called lextest, enter the following
commands:
lex lextest
cc lex.yy.c -ll
Program:
%{
%}
delim [tn]
ws delim+
letter [A-Za-z]
digit [1-9]
id {letter}({letter}|{digit})*
number {digit}+(.{digit}+)?(E[+/-]?{digit}+)?
%%
{ws} {/*printf("white space");*/}
if {printf("%s : Keywordn",yytext);}
then {printf("%s : Keywordn",yytext);}
else {printf("%s : Keywordn",yytext);}
{id} {printf("Identifier");}
{number} {printf("Number");}
"<"|"<="|"=="|"<>"|">"|">=" {printf("Relational operator");}
%%
main()
{
yylex();
}
System Programming Sunita M. Dol, CSE Dept
Walchand Institute of Technology, Solapur Page 5
Input:
if
Output:
Keyword
Conclusion:
The lex command generates a C language program that can analyze an input
stream using information in the specification file.
The lex command stores the output program in a lex.yy.c file.
If the output program recognizes a simple, one-word input structure, the
lex.yy.c output file can compile to produce an executable lexical analyzer.
Thus lexical analyzer generator or scanner is implemented using LEX

Handout#02

  • 1.
    System Programming Walchand Instituteof Technology Aim: Scanner using LEX Theory: Language processor development tools (LPDTs) focusing on generation of the analysis phase of language processors. Figure 1 shows a schematic language processor whose source two inputs: 1. Specification 2. Specification phase. Figure 1: A Language Processor Development Tool It generates programs that perform lexical, syntax and semantic analysis source program and construct the IR. These programs collectively form the analysis phase of the language processor. Sunita M. Dol, CSE Dept Walchand Institute of Technology, Solapur HANDOUT#02 guage processor development tools (LPDTs) focusing on generation of the analysis phase of language processors. shows a schematic of an LPDT which generates the analysis phase whose source language is L. The LPDT requires the following Specification of a grammar of language L Specification of semantic actions to be performed in the analysis : A Language Processor Development Tool It generates programs that perform lexical, syntax and semantic analysis source program and construct the IR. These programs collectively form the language processor. Sunita M. Dol, CSE Dept Page 1 guage processor development tools (LPDTs) focusing on generation of the an LPDT which generates the analysis phase of a is L. The LPDT requires the following semantic actions to be performed in the analysis : A Language Processor Development Tool (LPDT) It generates programs that perform lexical, syntax and semantic analysis of the source program and construct the IR. These programs collectively form the
  • 2.
    System Programming Walchand Instituteof Technology Two LPDTs are widely used in practice. These are, th LEX, and the parser generator YACC. The input to these tools is a specification the lexical and syntactic constructs on recognizing the constructs. The specification consists rules of the form < where < semantic action > matching < string specification erate C programs which contain the code for scanning and parsing, respectively, and the semantic actions contained in the specification. A YACC generated parser can use a LEX generated scanner as a routine if the scanner and parser use same con Figure 2 shows a schematic for developing the analysis phase language L using LEX and YACC. The analysis phase processes the source program to build an intermediate represen Figure LEX LEX accepts an input specification which consists component is a specification and constants. This specification is in the form component is a specification consists of a set of tables Sunita M. Dol, CSE Dept Walchand Institute of Technology, Solapur Two LPDTs are widely used in practice. These are, the lexical analyzer generator LEX, and the parser generator YACC. The input to these tools is a specification the lexical and syntactic constructs of L, and the semantic actions to be performed on recognizing the constructs. The specification consists of a set <string specification> {< semantic action>} > consists of C code. This code is executed when a string string specification > is encountered in the input. LEX and YACC gen erate C programs which contain the code for scanning and parsing, respectively, and the semantic actions contained in the specification. A YACC generated parser can use a LEX generated scanner as a routine if the scanner and parser use same conventions concerning the representation shows a schematic for developing the analysis phase L using LEX and YACC. The analysis phase processes the source program to build an intermediate representation. Figure 2: using LEX and YACC LEX accepts an input specification which consists of two components. The first component is a specification of strings representing the lexical units in L, e.g. id’s and constants. This specification is in the form of regular expres component is a specification of semantic actions aimed at building an IR. The IR tables of lexical units and a sequence of tokens for the lexical Sunita M. Dol, CSE Dept Page 2 e lexical analyzer generator LEX, and the parser generator YACC. The input to these tools is a specification of L, and the semantic actions to be performed a set of translation string specification> {< semantic action>} C code. This code is executed when a string > is encountered in the input. LEX and YACC gen- erate C programs which contain the code for scanning and parsing, respectively, A YACC generated parser can use a LEX generated scanner as a routine if the oncerning the representation of tokens. shows a schematic for developing the analysis phase of a compiler for L using LEX and YACC. The analysis phase processes the source two components. The first strings representing the lexical units in L, e.g. id’s regular expressions. The second semantic actions aimed at building an IR. The IR tokens for the lexical
  • 3.
    System Programming Walchand Instituteof Technology units occurring in a source statement. Accordingly, the semantic action entries in the tables and build tokens for the lexical units. Example Figure 3 shows a sample input to LEX. The input consists three of which are shown here. The first component (enclosed defines the symbols used in specifying the strings to stand for any upper or lower case letter, and digit to stand for any digit. The sec ond component enclosed between %% and %% contains the translation rules. The third component contains aux actions. Figure The sample input in above assignment operator), and identifier and constant strings found, it is entered in the symbol table (if not already present) using the routine enter.id. The pair (ID, entry #) convention entry # is put in the global variable yylval, and the cla returned as the value of the call on scanner. Similar actions are taken on finding a constant, the keywords begin and end and the assignment operator. Compiling the Lexical Analyzer To compile a lex program, do the following: Sunita M. Dol, CSE Dept Walchand Institute of Technology, Solapur units occurring in a source statement. Accordingly, the semantic action entries in the tables and build tokens for the lexical units. shows a sample input to LEX. The input consists of which are shown here. The first component (enclosed ed in specifying the strings of L. It defines the symbol letter to stand for any upper or lower case letter, and digit to stand for any digit. The sec ond component enclosed between %% and %% contains the translation rules. The third component contains auxiliary routines which can be used in the semantic Figure 3: A sample LEX specification above Figure 3 defines the strings begin, end, := (the assignment operator), and identifier and constant strings of L. When an identifier is found, it is entered in the symbol table (if not already present) using the routine entry #) forms the token for the identifier string. is put in the global variable yylval, and the cla the call on scanner. Similar actions are taken on finding a constant, the keywords begin and end and the assignment operator. Compiling the Lexical Analyzer program, do the following: Sunita M. Dol, CSE Dept Page 3 units occurring in a source statement. Accordingly, the semantic actions make new of four components, which are shown here. The first component (enclosed by %{ and %}) L. It defines the symbol letter to stand for any upper or lower case letter, and digit to stand for any digit. The sec- ond component enclosed between %% and %% contains the translation rules. The iliary routines which can be used in the semantic : A sample LEX specification defines the strings begin, end, := (the L. When an identifier is found, it is entered in the symbol table (if not already present) using the routine forms the token for the identifier string. By is put in the global variable yylval, and the class code ID is the call on scanner. Similar actions are taken on finding a constant, the keywords begin and end and the assignment operator.
  • 4.
    System Programming SunitaM. Dol, CSE Dept Walchand Institute of Technology, Solapur Page 4 1. Use the lex program to change the specification file into a C language program. The resulting program is in the lex.yy.c file. 2. Use the cc command with the -ll flag to compile and link the program with a library of lex subroutines. The resulting executable program is in the a.out file. For example, if the lex specification file is called lextest, enter the following commands: lex lextest cc lex.yy.c -ll Program: %{ %} delim [tn] ws delim+ letter [A-Za-z] digit [1-9] id {letter}({letter}|{digit})* number {digit}+(.{digit}+)?(E[+/-]?{digit}+)? %% {ws} {/*printf("white space");*/} if {printf("%s : Keywordn",yytext);} then {printf("%s : Keywordn",yytext);} else {printf("%s : Keywordn",yytext);} {id} {printf("Identifier");} {number} {printf("Number");} "<"|"<="|"=="|"<>"|">"|">=" {printf("Relational operator");} %% main() { yylex(); }
  • 5.
    System Programming SunitaM. Dol, CSE Dept Walchand Institute of Technology, Solapur Page 5 Input: if Output: Keyword Conclusion: The lex command generates a C language program that can analyze an input stream using information in the specification file. The lex command stores the output program in a lex.yy.c file. If the output program recognizes a simple, one-word input structure, the lex.yy.c output file can compile to produce an executable lexical analyzer. Thus lexical analyzer generator or scanner is implemented using LEX