Module 4
Lexical Analyser
Compilation Sequence
Creating a Lexical Analyzer with Lex:
1. First a specification of a lexical analyzer is prepared by
creating a program lex.l in the lex language.
2. lex.l is run through the lex compiler to produce a C
program lex.yy.c
3. Finally, lex.yy.c is run through the C compiler to produce
an object program a.out
Creating a Lexical Analyzer with Lex:
Lex
Compiler
Lex Source Program
lex . l
lex.yy.c
C
Compiler
a. out
lex.yy.c
Lexical Analyses phase reads
characters (src pgm)
stream of tokens
an identifier,
a keyword (if, while etc.)
a punctuation character or
a multi-character operator like : =
The general format of Lex source is:
%{
Definitions
%}
%%
{rules}
%%
{user subroutines}
Regular
Expression
Description
A-Z, 0-9, a-z Characters and numbers that form part of the pattern.
. Matches any single character except n.
*
Match with zero or more occurrences of the preceding pattern
or expression.
Example: [0-9]*
+
Matches with one or more occurrences of the preceding
pattern.
Example: [a-z]+
?
Match with zero or one occurrence of the preceding pattern or
expression.
Example: -?[0-9]* : starts with an – sign
^
1. Matches the beginning of a line as the first character.
Example: ^verb means input starts with a verb word
2. Used as for negation in Character class.
Example: [^0-9]+ means Except 0-9
Characters that form the regular expressions:
[ ]
A character class. Matches any character in the brackets. – Used to
denote a range.
Example: [A-Z] implies all characters from A to Z.
$
Matches the end of the line as the last character of the pattern.
Example: a+b$
{ }
Indicates how many times a pattern can be present.
Example: A{1,3} implies one or three occurrences of A may be
present.
| Logical OR between expressions.

Used to escape meta characters. Also used to remove the special
meaning of characters as defined in this table.
Example: ” [a-z]+ ”
“ ”
The string written in quotes matches literally.
Example: “hello”
/
Look ahead. Matches the preceding pattern only if followed by the
succeeding expression. Example: A0/1 matches A0 only if A01 is
the input.
( )
Groups a series of regular expressions.
Example: ([0-9]+) | ([0-9]*.[0-9]+)
LEX Actions:
Action Description
BEGIN
It indicates the start state. The lexical analyzer starts at
state 0.
ECHO It emits the input as it is.
Char *yytext
When the lexer matches or recognizes the token from the
input token then the lexeme is stored in a null-terminated
string called yytext.
FILE *yyin It is the standard input file.
FILE *yyout It is the standard output file.
int yyleng
It stores the length or number of characters in the input
string.
yylex( )
This is an important function. As soon as a call to yylex( ) is
encountered, the scanner starts scanning the source
program.
yywrap( ) It calls when the scanner encounters the end of file.
yylval It gives the value associated with the token.
Some examples of regular expression and their meanings are given in the
following table.
Regular
Expression
Meaning
joke[rs] Matches either jokes or joker
A{1,2}shis+ Matches AAshis, Ashis, AAshi, Ashi
(A[b-e])? Matches zero or one occurrences of A followed by
any character from b to e.
 Tokens in Lex are declared like variable name in C.
 Every token has an associated expression.
Token Associated Expression Meaning
number ([0-9])+ 1 or more occurrences of
a digit
chars [A-Za-z] Any character
blank " " A blank space
word (chars)+ 1 or more occurrences
of chars
variable (chars)+(number)*(chars)*( number)*

module 4_ Lex_new.ppt

  • 1.
  • 2.
  • 3.
    Creating a LexicalAnalyzer with Lex: 1. First a specification of a lexical analyzer is prepared by creating a program lex.l in the lex language. 2. lex.l is run through the lex compiler to produce a C program lex.yy.c 3. Finally, lex.yy.c is run through the C compiler to produce an object program a.out
  • 4.
    Creating a LexicalAnalyzer with Lex: Lex Compiler Lex Source Program lex . l lex.yy.c C Compiler a. out lex.yy.c
  • 5.
    Lexical Analyses phasereads characters (src pgm) stream of tokens an identifier, a keyword (if, while etc.) a punctuation character or a multi-character operator like : =
  • 6.
    The general formatof Lex source is: %{ Definitions %} %% {rules} %% {user subroutines}
  • 7.
    Regular Expression Description A-Z, 0-9, a-zCharacters and numbers that form part of the pattern. . Matches any single character except n. * Match with zero or more occurrences of the preceding pattern or expression. Example: [0-9]* + Matches with one or more occurrences of the preceding pattern. Example: [a-z]+ ? Match with zero or one occurrence of the preceding pattern or expression. Example: -?[0-9]* : starts with an – sign ^ 1. Matches the beginning of a line as the first character. Example: ^verb means input starts with a verb word 2. Used as for negation in Character class. Example: [^0-9]+ means Except 0-9 Characters that form the regular expressions:
  • 8.
    [ ] A characterclass. Matches any character in the brackets. – Used to denote a range. Example: [A-Z] implies all characters from A to Z. $ Matches the end of the line as the last character of the pattern. Example: a+b$ { } Indicates how many times a pattern can be present. Example: A{1,3} implies one or three occurrences of A may be present. | Logical OR between expressions. Used to escape meta characters. Also used to remove the special meaning of characters as defined in this table. Example: ” [a-z]+ ” “ ” The string written in quotes matches literally. Example: “hello” / Look ahead. Matches the preceding pattern only if followed by the succeeding expression. Example: A0/1 matches A0 only if A01 is the input. ( ) Groups a series of regular expressions. Example: ([0-9]+) | ([0-9]*.[0-9]+)
  • 9.
    LEX Actions: Action Description BEGIN Itindicates the start state. The lexical analyzer starts at state 0. ECHO It emits the input as it is. Char *yytext When the lexer matches or recognizes the token from the input token then the lexeme is stored in a null-terminated string called yytext. FILE *yyin It is the standard input file. FILE *yyout It is the standard output file. int yyleng It stores the length or number of characters in the input string. yylex( ) This is an important function. As soon as a call to yylex( ) is encountered, the scanner starts scanning the source program. yywrap( ) It calls when the scanner encounters the end of file. yylval It gives the value associated with the token.
  • 10.
    Some examples ofregular expression and their meanings are given in the following table. Regular Expression Meaning joke[rs] Matches either jokes or joker A{1,2}shis+ Matches AAshis, Ashis, AAshi, Ashi (A[b-e])? Matches zero or one occurrences of A followed by any character from b to e.
  • 11.
     Tokens inLex are declared like variable name in C.  Every token has an associated expression. Token Associated Expression Meaning number ([0-9])+ 1 or more occurrences of a digit chars [A-Za-z] Any character blank " " A blank space word (chars)+ 1 or more occurrences of chars variable (chars)+(number)*(chars)*( number)*