COMPILED BY: RITURAJ JAIN
LEX –
LEX –
Lexical Analyzer
Lexical Analyzer
Generator
Generator
COMPILED BY: RITURAJ JAIN
Lex – A Lexical Analyzer Generator
Lex – A Lexical Analyzer Generator
 A Unix Utility from early 1970s
 A tool widely used to specify lexical analyzers for a variety of languages
 We refer to the tool as Lex compiler , and to its input specification as the Lex language.
 A Compiler that Takes as Source a Specification for:
 Tokens/Patterns of a Language
 Generates a “C” Lexical Analyzer Program
COMPILED BY: RITURAJ JAIN
Lex – A Lexical Analyzer Generator
Lex – A Lexical Analyzer Generator
Lex
Compiler
C
Compiler
a.out
Lex Source
Program:
lex.l
lex.yy.c
lex.yy.c a.out
Input stream Sequence
of tokens
Generates lex.yy.c which defines a routine yylex()
COMPILED BY: RITURAJ JAIN
Lex – A Lexical Analyzer Generator
Lex – A Lexical Analyzer Generator
COMPILED BY: RITURAJ JAIN
Format of a Lexical Specification
Format of a Lexical Specification
Lex.y File Format:
DECLARATIONS
%%
TRANSLATION RULES
%%
AUXILIARY PROCEDURES
 The lex input file consists of three sections, separated by a
line with just %% in it:
COMPILED BY: RITURAJ JAIN
Format of a Lexical Specification
Format of a Lexical Specification
Definitions Section
Definitions Section
 This section helps to create an atmosphere in two area.
 First, it creates an environment for the lexer, which is a C code.
 This area of the Lex specification is separated by “%{
%{” and “%}
%}”
 It contains C statements, such as global declarations, commands, including
library files and other declarations which will be copied to the lexical analyzer
(i.e. lex.yy.c) when it passes through the lex tool.
Lex.y File Format:
DECLARATIONS
%%
TRANSLATION RULES
%%
AUXILIARY PROCEDURES
COMPILED BY: RITURAJ JAIN
Format of a Lexical Specification
Format of a Lexical Specification
Definitions Section
Definitions Section
 Secondly, the definition section provides an environment for the lex tool to
convert the Lex specification correctly and efficiently to a lexical analyzer.
 It has declarations of simple name definitions i.e. regular definition to simplify
the scanner specification.
 Regular / Name definitions have the form:
name definition
 Example:
DIGIT [0-9]
ID [a-z][a-z0-9]*
Lex.y File Format:
DECLARATIONS
%%
TRANSLATION RULES
%%
AUXILIARY PROCEDURES
COMPILED BY: RITURAJ JAIN
Format of a Lexical Specification
Format of a Lexical Specification
Rules Section
Rules Section
 The rules section of the lex input contains a series of rules of the form:
pattern1 {action1}
pattern2 {action2}
 Pattern is in the form of a regular expression to match the largest possible
string.
 Once the pattern is matched, the corresponding action part is invoked.
 The action part contains normal C language statements which are enclosed
in “{” and “}” characters.
Lex.y File Format:
DECLARATIONS
%%
TRANSLATION RULES
%%
AUXILIARY PROCEDURES
COMPILED BY: RITURAJ JAIN
Format of a Lexical Specification
Format of a Lexical Specification
Rules Section
Rules Section
 Example:
{ID} printf( "An identifier: %sn", yytext );
 The yytext is used to store lexeme of the matched input string and
yylength variable is used to store length of the lexemes.
 If action is empty, the matched token is discarded.
Lex.y File Format:
DECLARATIONS
%%
TRANSLATION RULES
%%
AUXILIARY PROCEDURES
COMPILED BY: RITURAJ JAIN
Format of a Lexical Specification
Format of a Lexical Specification
Lex.y File Format:
DECLARATIONS
%%
TRANSLATION RULES
%%
AUXILIARY PROCEDURES
Auxiliary Procedures
Auxiliary Procedures
 The third section holds whatever auxiliary procedures are needed by the
actions and it is simply copied to lex.yy.c verbatim.
 Alternatively these procedures can be compiled separately and loaded with
the lexical analyzer.
 The auxiliary procedures are written in C language.
 The presence of this section is optional; if it is missing, the second %% in
the input file may be skipped.
 In the definitions and rules sections, any indented text or text enclosed in
%{ and %} is copied verbatim to the output (with the %{}'s removed).
COMPILED BY: RITURAJ JAIN
Lex predefined variables
Lex predefined variables
COMPILED BY: RITURAJ JAIN
digit [0-9]
letter [a-zA-Z]
%%
{letter}({letter}|{digit})* printf(“id: %sn”, yytext);
n printf(“new linen”);
%%
main() {
yylex();
}
Format of a Lexical Specification
Format of a Lexical Specification
Lex.y File Format:
DECLARATIONS
%%
TRANSLATION RULES
%%
AUXILIARY PROCEDURES
COMPILED BY: RITURAJ JAIN
Format of a Lexical Specification
Format of a Lexical Specification
COMPILED BY: RITURAJ JAIN
Internal Structure of Lex
Internal Structure of Lex
Lex
Lex
The final states of the DFA are
associated with actions

LEX Intrduction Compiler Construction_VIET.ppt

  • 1.
    COMPILED BY: RITURAJJAIN LEX – LEX – Lexical Analyzer Lexical Analyzer Generator Generator
  • 2.
    COMPILED BY: RITURAJJAIN Lex – A Lexical Analyzer Generator Lex – A Lexical Analyzer Generator  A Unix Utility from early 1970s  A tool widely used to specify lexical analyzers for a variety of languages  We refer to the tool as Lex compiler , and to its input specification as the Lex language.  A Compiler that Takes as Source a Specification for:  Tokens/Patterns of a Language  Generates a “C” Lexical Analyzer Program
  • 3.
    COMPILED BY: RITURAJJAIN Lex – A Lexical Analyzer Generator Lex – A Lexical Analyzer Generator Lex Compiler C Compiler a.out Lex Source Program: lex.l lex.yy.c lex.yy.c a.out Input stream Sequence of tokens Generates lex.yy.c which defines a routine yylex()
  • 4.
    COMPILED BY: RITURAJJAIN Lex – A Lexical Analyzer Generator Lex – A Lexical Analyzer Generator
  • 5.
    COMPILED BY: RITURAJJAIN Format of a Lexical Specification Format of a Lexical Specification Lex.y File Format: DECLARATIONS %% TRANSLATION RULES %% AUXILIARY PROCEDURES  The lex input file consists of three sections, separated by a line with just %% in it:
  • 6.
    COMPILED BY: RITURAJJAIN Format of a Lexical Specification Format of a Lexical Specification Definitions Section Definitions Section  This section helps to create an atmosphere in two area.  First, it creates an environment for the lexer, which is a C code.  This area of the Lex specification is separated by “%{ %{” and “%} %}”  It contains C statements, such as global declarations, commands, including library files and other declarations which will be copied to the lexical analyzer (i.e. lex.yy.c) when it passes through the lex tool. Lex.y File Format: DECLARATIONS %% TRANSLATION RULES %% AUXILIARY PROCEDURES
  • 7.
    COMPILED BY: RITURAJJAIN Format of a Lexical Specification Format of a Lexical Specification Definitions Section Definitions Section  Secondly, the definition section provides an environment for the lex tool to convert the Lex specification correctly and efficiently to a lexical analyzer.  It has declarations of simple name definitions i.e. regular definition to simplify the scanner specification.  Regular / Name definitions have the form: name definition  Example: DIGIT [0-9] ID [a-z][a-z0-9]* Lex.y File Format: DECLARATIONS %% TRANSLATION RULES %% AUXILIARY PROCEDURES
  • 8.
    COMPILED BY: RITURAJJAIN Format of a Lexical Specification Format of a Lexical Specification Rules Section Rules Section  The rules section of the lex input contains a series of rules of the form: pattern1 {action1} pattern2 {action2}  Pattern is in the form of a regular expression to match the largest possible string.  Once the pattern is matched, the corresponding action part is invoked.  The action part contains normal C language statements which are enclosed in “{” and “}” characters. Lex.y File Format: DECLARATIONS %% TRANSLATION RULES %% AUXILIARY PROCEDURES
  • 9.
    COMPILED BY: RITURAJJAIN Format of a Lexical Specification Format of a Lexical Specification Rules Section Rules Section  Example: {ID} printf( "An identifier: %sn", yytext );  The yytext is used to store lexeme of the matched input string and yylength variable is used to store length of the lexemes.  If action is empty, the matched token is discarded. Lex.y File Format: DECLARATIONS %% TRANSLATION RULES %% AUXILIARY PROCEDURES
  • 10.
    COMPILED BY: RITURAJJAIN Format of a Lexical Specification Format of a Lexical Specification Lex.y File Format: DECLARATIONS %% TRANSLATION RULES %% AUXILIARY PROCEDURES Auxiliary Procedures Auxiliary Procedures  The third section holds whatever auxiliary procedures are needed by the actions and it is simply copied to lex.yy.c verbatim.  Alternatively these procedures can be compiled separately and loaded with the lexical analyzer.  The auxiliary procedures are written in C language.  The presence of this section is optional; if it is missing, the second %% in the input file may be skipped.  In the definitions and rules sections, any indented text or text enclosed in %{ and %} is copied verbatim to the output (with the %{}'s removed).
  • 11.
    COMPILED BY: RITURAJJAIN Lex predefined variables Lex predefined variables
  • 12.
    COMPILED BY: RITURAJJAIN digit [0-9] letter [a-zA-Z] %% {letter}({letter}|{digit})* printf(“id: %sn”, yytext); n printf(“new linen”); %% main() { yylex(); } Format of a Lexical Specification Format of a Lexical Specification Lex.y File Format: DECLARATIONS %% TRANSLATION RULES %% AUXILIARY PROCEDURES
  • 13.
    COMPILED BY: RITURAJJAIN Format of a Lexical Specification Format of a Lexical Specification
  • 14.
    COMPILED BY: RITURAJJAIN Internal Structure of Lex Internal Structure of Lex Lex Lex The final states of the DFA are associated with actions