3. Chapter 1
Outline
• Introduction
• Programs related to compiler
• Phases of a Compiler
• Analysis
• Lexical analysis
• Syntax analysis
• Semantic analysis
3
4. Introduction
What is a compiler?
• a program that reads a program written in one language (the
source language) and translates it into an equivalent
program in another language (the target language).
• Why we study compiler construction techniques?
• Compilers provide an essential interface between
applications and architectures
4
Source
program
(High level
language)
Compile
Target program
Error messages (Assembly or machine
language)
Target program
exe
Input Output
5. Programs related to compilers
These programs are Known as cousin of compiler->
means the context in which the compiler typically
operates
These are: Preprocessor, Interpreter, Assemblers,
Linker, and Loader.
Pre-processors
• A pre-processor is a separate program that is called by
the compiler before actual translation begins.
• The output preprocessors may be given as the input
to compilers.
5
6. Programs related to compiler…
Tasks performed by the preprocessors are:
1. Macro processing: A preprocessor may allow a user to define
macros that are short hands for longer constructs.
2. File inclusion: A preprocessor may include header files into
the program text.
3. Rational preprocessor: these preprocessors augment older
languages with more modern flow-of control and data structuring
facilities.
4. Language Extensions: These preprocessor attempts to add
capabilities to the language by certain amounts to build-in macro.
6
7. Programs related to compilers
Interpreter
• Is a program that Works by analyzing and executing the
source program commands one at a time
• Does not translate the whole source program into object
code
• Interpretation is important when:
Programmer is working in interactive mode and needs
to view and update variables
Running speed is not important
Commands have simple formats, and thus can be
quickly analyzed and executed
Modification or addition to user programs is required as
execution proceeds
7
8. Programs related to compilers
Interpreter and compiler
8
Source code
Source code
Exe code Machine
Compilation Processing
Intermediate
code
Interpreter
Compilation Interpretation
a) Compiler
b) Interpreter
NB: Compilers generate machine code, whereas interpreters interpret intermediate code
9. Programs related to compilers
differences between Interpreter and compiler
9
Interpreter takes one statement
then translates it and executes
it and then takes another
statement.
Interpreter will stop the
translation after it gets the first
error.
Interpreter takes less time to
analyze the source code.
Over all execution speed is less.
While compiler translates the
entire program in one go and
then executes it.
Compiler generates the error
report after the translation of
the entire program.
Compiler takes a large amount
of time in analyzing and
processing the high level
language code.
Overall execution time is
faster.
10. E.g., Compiling Java Programs
Java Program
compiler
Java bytecode
Interpreter
Win
Mac
Unix
The Java compiler produces bytecode not machine code
Bytecode is converted into machine code using a Java
Interpreter
You can run bytecode on any computer that has a Java
Interpreter installed
10
11. Java
11
Java source program may first be
compiled into an intermediate form
called bytecodes
The bytecodes are then interpreted by a
virtual machine.
12. Programs related to compiler…
Assemblers
• Translator for the assembly language.
• Assembly code is translated into machine code
• Output is relocatable machine code.
Linker
• Links object files separately assembled
• Links object files to standard library functions
• Generates a file that can be loaded and executed
Loader
• Loading of the executable codes, which are the
outputs of linker, into main memory.
12
13. Programs related to compiler…
Loader
• Loading of the executable codes, which are the outputs of
linker, into main memory.
13
14. Programs related to compiler . . .
14
Preprocessor
Compiler
C or C++ program
C/C++ program with
macro
substitutions(expands)
and file inclusions
Assembler
Assembly code
Linker
Relocatable object
module
Executable code
Other relocatable
object modules or
library modules
Loader
Absolute machine code
These are called: Program Execution or
Language-processing System
15. The Analysis-Synthesis Model of Compilation
• There are two parts to compilation: analysis and synthesis.
• The analysis part breaks up the source program into constituent
pieces and creates an intermediate representation of the source
program.
Lexical Analyzer, Syntax Analyzer and Semantic
Analyzer are the parts of this phase.
• The synthesis part constructs the desired target program from
the intermediate representation.
Intermediate Code Generator, Code Optimizer, and Code
Generator are the parts of this phase.
• Of the two parts, synthesis requires the most specialized
technique.
15
16. The Analysis-Synthesis Model of Compilation
Analysis (front end)
• Machine Independent/Language Dependent)
Synthesis (back end)
• Machine Dependent/Language independent)
16
17. The Phases of a Compiler
17
Conceptually, a compiler operates in
phases, each of which transforms the
source program from one representation
to another.
The first three phases, forming the bulk
of the analysis portion of a compiler.
Two other activities, symbol-table
management and error handling, are
shown interacting with the six phases of
lexical analysis, syntax analysis,
semantic analysis, intermediate code
generation , code optimization, and code
generation.
18. • During analysis, the operations implied by the source
program are determined and recorded in a hierarchical
structure called a tree.
• Often, a special kind of tree called a syntax tree is
used, in which each node represents an operation and
the children of a node represent the arguments of the
operation.
• For example, a syntax tree for an assignment statement
is shown in Figure below.
The Analysis-Synthesis Model of Compilation
18
19. Analysis of the source
program
Analysis consists of three phases:
• Linear/Lexical analysis
• Hierarchical/ analysis
analysis
19
20. 1. Lexical analysis or Scanning
• Lexical Analyzer reads the source program character by character
and returns the tokens of the source program.
• A token describes a pattern of characters having collective or same
meaning in the source program. (such as identifiers, operators,
keywords, numbers, delimeters and so on)
• A lexical analyzer, also called a lexer or a scanner.
• It receives a stream of characters from the source program
and groups them into tokens.
Blanks, new lines, tabulation marks will be removed during
lexical analysis.
20
Source
program
Lexical
analyzer
Streams of
tokens
21. Lexical Analysis
For example, in lexical analysis the characters in the assignment
statement position = initial + rate * 60 would be grouped into the
following tokens:
1. The identifier position.
2. The assignment symbol =
3. The identifier initial.
4. The plus sign
5. The identifier rate
6. The multiplication sign
7. The number 60
• Regular expressions are used to describe tokens (lexical constructs).
• A (Deterministic) Finite State Automaton can be used in the
implementation of a lexical analyzer.
21
Tokens
22. 2. Syntax analysis or Parsing
• The parser receives the source code in the form of tokens
from the scanner and performs syntax analysis.
• The results of syntax analysis are usually represented by a
parse tree or a syntax tree.
• Syntax tree each interior node represents an operation
and the children of the node represent the arguments of the
operation.
• The syntactic structure of a programming language is
determined by context free grammar (CFG).
22
Stream of
tokens
Syntax
analyzer
Abstract
syntax tree
23. Syntax analysis or Parsing…
• Example. Consider the parse tree of the following C code:
Parse tree for position = initial + rate * 60
23
Usually, the grammatical phrases of the source program are
represented by a parse tree.
24. 3. Semantic analysis
The semantics of a program are its meaning as opposed
to syntax or structure.
The semantics consist of:
Runtime semantics – behavior of program at runtime
Static semantics – checked by the compiler
Static semantics include:
Declarations of variables and constants before use
Calling functions that exist (predefined in a library or defined by
the user)
Passing parameters properly
Type checking.
The semantic analyzer does the following:
Checks the static semantics of the language
Annotates the syntax tree with type information
24
25. Semantic analysis…
25
Ex. Consider again the following C code: position = initial + rate * 60
Figure of Semantic analysis inserts a conversion from integer to real
26. Synthesis of the target program
Intermediate code generator
• The target code(program) generator
26
27. Intermediate code generator
Comes after syntax and semantic analysis
Intermediate representation should have 2 important
properties:
Should be easy to produce
Should be easy to translate into the target program
Intermediate representation(IR) can have a variety of forms:
• Three-address code, Postfix notation, Tree or DAG representation
• The commonly used representation is three address formats .
27
Abstract syntax
Intermediate code
generator
Intermediate code
Three address code for the original C expression: The IR code for the given
input is as follows: :
temp1 = inttoreal ( 60 )
temp2 = id3 * temp1
temp3 = id2 + temp2
id1 = temp3
28. Code generator
• The machine code generator receives the (optimized)
intermediate code, and then it produces either:
– Machine code for a specific machine, or
– Assembly code for a specific machine and assembler.
• Code generator
– Selects appropriate machine instructions
– Allocates memory locations for variables
– Allocates registers for intermediate computations
28
29. Code generator…
• The code generator takes the IR code and generates code for the
target machine.
• Here we will write target code in assembly language:
position = initial + rate * 60
NB: The Input is IR of the previous three address code format
• Using registers R1 and R2,the translation of the given example is:
MOV id3 ,R2
MUL #60.0 , R2
MOV id2 , R1
ADD R2 , R1
MOV R1 , id1
29
30. Major Data Structures in a Compiler
• Token
Represented by an integer value or an
enumeration literal
Sometimes, it is necessary to preserve the string
of characters that was scanned
For example, name of an identifiers or value of a
literal
• Syntax Tree
Constructed as a pointer-based structure
Dynamically allocated as parsing proceeds
Nodes have fields containing information
collected by the parser and semantic analyzer 30
31. Major Data Structures in a Compiler…
• Symbol Table
Keeps information associated with all kinds of
tokens:
• Identifier, Numbers, variables, fonctions,
paramètres, types, Fields, etc.
Tokens are entered by the scanner and parser
Code generation and optimization phases use the
information in the symbol table
Performance Issues
Insertion, deletion, and search operations need to
be efficient because they are frequent
More than one symbol table may be used
31