2. What is Lex?
• Lex/Flex allows to specify a LA by specifying
regular definitions to describe pattern for tokens.
• The input notation for the lex tool is reffered as
the Lex language and tool itself is the Lex tool.
• Lex compiler transforms the input patterns into a
TD and generates a code, in a file called lex.yy.c
that simulates the TD.
3. How Lex Works?
lex.l lex.yy.c
lex.yy.c a.out
Input Stream Sequence of
Tokens
Lex Compiler
C Compiler
a.out
4. Lex in Detail
User Tokens
Lex
(regex+action)
yylex()
{
}
Optional
Driver
Code
GCC
5. Lex Specification
A lex program has following form,
declarations
%%
translation rules
%%
driver functions
Note: %% is the smallest possible lex program
6. Declarations
A series of rules of the form
name definitions
E.g. DIGIT [0-9]
COMMENTSTART “*”
ID [a-zA-Z][a-zA-Z0-9]*
7. Contd...
The parts required to be copied to lex.yy.c
should be written inside %{... %}.
E.g.
%{
//comment
#include<stdio.h>
%}
8. Translation Rules
Rules portion of the lex program contains a
sequence of rules of the form,
pattern action
E.g.
{letter}({letter}|{digit})* return id;
{digit}+ return num;
Note: action must begin in same line
10. Compiling Lex
Installation: sudo apt-get install flex
Step-1: lex demo.l
Step-2: gcc lex.yy.c –ll
Step-3: ./a.out
Note: -ll is used to link the default yywarp().
11. Lex Functions
yylex(): each invocation scans the input where
left off, returns 0 on EOF
yytext: buffer holds the characters that match
the pattern, char *yytext
yyleng: length of lexeme matched, return an
integer
yyin: the input stream pointer, FILE *
yyout: the output stream pointer, FILE *
12. Conflict Resolution in Lex
Conflict possible when more than one pattern
matches the input or lexeme. Rules to avoid the
conflict,
a. The longest match is chosen
b. If multiple rules match, rule enlisted top
most is chosen
13. Contd...
E.g. Input: CS335
(CS) {printf(“Department”);}
(CS)[0-9]3 {printf(“Course”);}
[a-zA-Z]+[0-9]+ {printf(“AnythingElse”);}
Output: Course
E.g.
Input: CS3351 Output: AnythingElse
14. Some More Translation Rules
[ tn] ; (represents no action)
[a-z]+ ECHO; (display the matched
lexeme)
. (class of characters except
n)
[^a-zA-Z] (any character is not a
letter)
ab?c (ac or abc)