1. CS416 Compiler Design 1
Unit – I Syllabus
Introduction: Language processors,
The Structure of a Compiler,
The science of building a complier
Lexical Analysis: The Role of the lexical analyzer,
Input buffering,
Specification of tokens,
Recognition of tokens,
The lexical analyzer generator: Lex,
Design of a Lexical Analyzer generator
2. Compiler Construction Tools
Writing a Compiler is difficult and time consuming task. There are some specialized tools
for helping implementation of various phases of compiler. These Tools are called Compiler
Construction Tools.
Some of the Tools are shown below:
1)Scanner: generates lexical analyzers based on the given regular expressions.
Example: LEX tool
2)Parse Generator: produces Syntax analyzer where specification must be given in CFG.
Example: YACC tool
3)Syntax-Directed Translation Engines: It helps us to produce intermediate code generator
based on the given parse tree Notations in the form of SDD.
4)Data Flow Engine: It helps us to produce code optimizer.
5)Automatic Code Generator: It helps us to produce code generator, that takes intermediate
code and converts into equivalent machine code.
3. Lexical Analyzer Generator - Lex
Introduction:-
Lex is a Unix utility which generates the Lexical analyzer.
Lex allows us to specify a lexical analyzer by regular expressions to
describe patterns for tokens.
The input notation for the Lex tool is referred to as the Lex language
program and the tool itself is the Lex compiler.
Behind the scenes, the Lex compiler transforms the input pattern regular
expressions into a transition diagram and generates c-language code, in a
file called lex.yy.c, that simulates this transition diagram.
5. Here, the input file, which we call lex.1, is written in the Lex language
and describes the lexical analyzer to be generated in terms of regular
expressions.
The Lex compiler transforms lex.1 to a C program, in a file that is named
lex.yy.c that simulates this transition diagram.
There after, lex.yy.c file is compiled by the C compiler into a file called
a. out.
The C-Compiler outcome file (a.out) is a working lexical analyzer, when
it is executed it take a stream of input characters and produce a stream of
tokens.
7. The Declarations section includes declarations of variables,
manifest constants used in regular Expressions.
The Translation rules section consists rules of the form
Pattern { Action }
Each pattern is a regular expression, which may use the
regular definitions of the declaration section.
The actions are fragments of code, typically written in C.
The Auxiliary Section holds whatever additional functions
are used in the actions.
8. Example1:
Write a simple Lex source program that recognizes Noun and Verb from the
given set of Strings.
Sol:-
9. Example2:
%{
/* definitions of manifest constants
LT, LE, EQ, NE, GT, GE,
IF, THEN, ELSE, ID, NUMBER, RELOP */
%}
/* regular definitions*/
%%
delim [ tn]
ws {delim}+
letter [A-Za-z]
digit [0-9]
id {letter}({letter}|{digit})*
number {digit}+(.{digit}+)?(E[+-]?{digit}+)?
{ws} {/* no action and no return */}
if {return(IF);}
then {return(THEN);}
else {return(ELSE);}
{id} {yylval = (int) installID(); return(ID); }
{number} {yylval = (int) installNum(); return(NUMBER);}
…
10. CS416 Compiler Design 10
Unit – I Syllabus
Introduction: Language processors,
The Structure of a Compiler,
The science of building a complier
Lexical Analysis: The Role of the lexical analyzer,
Input buffering,
Specification of tokens,
Recognition of tokens,
The lexical analyzer generator Lex,
Design of a Lexical Analyzer generator
11. 11
Design of a Lexical Analyzer generator
Introduction:-
lexical-analyzer generator is architected by Finite State Machine
with two approaches namely based on NFA and DFA;
Lex tool uses the DFA implementation internally.
12. 12
Structure of the Generated Analyzer:-
The program that serves as the lexical analyzer includes a fixed
program that implements an automaton for the given LEX
program.
Below diagram shows the architecture of a lexical analyzer
generated by Lex.
14. 14
Here, the Lex program is turned into a transition table and actions
by the Lex Compiler, which are used by the Finite Automaton
Simulator.
Automaton is constructed by taking each regular-expression
pattern in the Lex program and converting it into an NFA, after
that it is converted into DFA and corresponding Transition table
is constructed.
When Transition Table is executed it creates Automaton that reads
the input buffer strings of source program and returns all
accepted lexemes and tokens
15. CS416 Compiler Design 15
Unit – I Syllabus
Introduction: Language processors,
The Structure of a Compiler,
The science of building a complier
Lexical Analysis: The Role of the lexical analyzer,
Input buffering,
Specification of tokens,
Recognition of tokens,
The lexical analyzer generator Lex,
Design of a Lexical Analyzer generator
16. 16
Finite Automata
Introduction:-
Regular expression is used as specification for Lexical Analyzer.
Finite automata is implementation of Lexical Analyzer
A finite automaton consists of
A set of states Q
An input alphabet
A start state q0
A set of transitions : state input state
A set of accepting states F Q
17. 17
Transition
s1 a s2
It is read as:
State s1 on input “a” go to state s2
After reading input
If it is in Final state => accept,
otherwise => reject
If no transition possible => reject
20. 20
Example1
A finite automaton that accepts only “1”
A finite automaton accepts a string if it reaches Final
State after reading the input string.
1
21. 21
Example2
A finite automaton accepting any number of 1’s followed
by a single 0
Where Alphabet: {0,1}
22. 22
Example2
A finite automaton accepting any number of 1’s followed
by a single 0
Where Alphabet: {0,1}
Check that “1110” is accepted but “110…” is not
0
1
25. 25
Epsilon Moves
Another kind of transition: -moves
• Machine can move from state A to state B
without reading input
A B
26. 26
Deterministic and Nondeterministic Automata
Deterministic Finite Automata (DFA)
One transition per input per state
No -moves
Nondeterministic Finite Automata (NFA)
Can have multiple transitions for one input in a given state
Can have -moves