Rift Valley University Harar Campus
Department of Computer science
Course: Compiler Design.
Course Code: CoSc3102.
Credit hour: 3
Target group: BSc.in comp-Science: 3rd yr., II Sem(Weekend)
Instructor: Gadisa A. (MSc).
Delivery Mode
• Lecture
• Quiz
• Group and Individual Assignments .
• Team/Project Work
Chapter 1
Outline
• Introduction
• Programs related to compiler
• Phases of a Compiler
• Analysis
• Lexical analysis
• Syntax analysis
• Semantic analysis
3
Introduction
What is a compiler?
• a program that reads a program written in one language (the
source language) and translates it into an equivalent
program in another language (the target language).
• Why we study compiler construction techniques?
• Compilers provide an essential interface between
applications and architectures
4
Source
program
(High level
language)
Compile
Target program
Error messages (Assembly or machine
language)
Target program
exe
Input Output
Programs related to compilers
These programs are Known as cousin of compiler->
means the context in which the compiler typically
operates
These are: Preprocessor, Interpreter, Assemblers,
Linker, and Loader.
 Pre-processors
• A pre-processor is a separate program that is called by
the compiler before actual translation begins.
• The output preprocessors may be given as the input
to compilers.
5
Programs related to compiler…
Tasks performed by the preprocessors are:
1. Macro processing: A preprocessor may allow a user to define
macros that are short hands for longer constructs.
2. File inclusion: A preprocessor may include header files into
the program text.
3. Rational preprocessor: these preprocessors augment older
languages with more modern flow-of control and data structuring
facilities.
4. Language Extensions: These preprocessor attempts to add
capabilities to the language by certain amounts to build-in macro.
6
Programs related to compilers
Interpreter
• Is a program that Works by analyzing and executing the
source program commands one at a time
• Does not translate the whole source program into object
code
• Interpretation is important when:
 Programmer is working in interactive mode and needs
to view and update variables
 Running speed is not important
 Commands have simple formats, and thus can be
quickly analyzed and executed
 Modification or addition to user programs is required as
execution proceeds
7
Programs related to compilers
 Interpreter and compiler
8
Source code
Source code
Exe code Machine
Compilation Processing
Intermediate
code
Interpreter
Compilation Interpretation
a) Compiler
b) Interpreter
NB: Compilers generate machine code, whereas interpreters interpret intermediate code
Programs related to compilers
 differences between Interpreter and compiler
9
 Interpreter takes one statement
then translates it and executes
it and then takes another
statement.
 Interpreter will stop the
translation after it gets the first
error.
 Interpreter takes less time to
analyze the source code.
 Over all execution speed is less.
 While compiler translates the
entire program in one go and
then executes it.
 Compiler generates the error
report after the translation of
the entire program.
 Compiler takes a large amount
of time in analyzing and
processing the high level
language code.
 Overall execution time is
faster.
E.g., Compiling Java Programs
Java Program
compiler
Java bytecode
Interpreter
Win
Mac
Unix
 The Java compiler produces bytecode not machine code
 Bytecode is converted into machine code using a Java
Interpreter
 You can run bytecode on any computer that has a Java
Interpreter installed
10
Java
11
Java source program may first be
compiled into an intermediate form
called bytecodes
The bytecodes are then interpreted by a
virtual machine.
Programs related to compiler…
 Assemblers
• Translator for the assembly language.
• Assembly code is translated into machine code
• Output is relocatable machine code.
 Linker
• Links object files separately assembled
• Links object files to standard library functions
• Generates a file that can be loaded and executed
 Loader
• Loading of the executable codes, which are the
outputs of linker, into main memory.
12
Programs related to compiler…
 Loader
• Loading of the executable codes, which are the outputs of
linker, into main memory.
13
Programs related to compiler . . .
14
Preprocessor
Compiler
C or C++ program
C/C++ program with
macro
substitutions(expands)
and file inclusions
Assembler
Assembly code
Linker
Relocatable object
module
Executable code
Other relocatable
object modules or
library modules
Loader
Absolute machine code
These are called: Program Execution or
Language-processing System
The Analysis-Synthesis Model of Compilation
• There are two parts to compilation: analysis and synthesis.
• The analysis part breaks up the source program into constituent
pieces and creates an intermediate representation of the source
program.
Lexical Analyzer, Syntax Analyzer and Semantic
Analyzer are the parts of this phase.
• The synthesis part constructs the desired target program from
the intermediate representation.
Intermediate Code Generator, Code Optimizer, and Code
Generator are the parts of this phase.
• Of the two parts, synthesis requires the most specialized
technique.
15
The Analysis-Synthesis Model of Compilation
 Analysis (front end)
• Machine Independent/Language Dependent)
 Synthesis (back end)
• Machine Dependent/Language independent)
16
The Phases of a Compiler
17
Conceptually, a compiler operates in
phases, each of which transforms the
source program from one representation
to another.
The first three phases, forming the bulk
of the analysis portion of a compiler.
Two other activities, symbol-table
management and error handling, are
shown interacting with the six phases of
lexical analysis, syntax analysis,
semantic analysis, intermediate code
generation , code optimization, and code
generation.
• During analysis, the operations implied by the source
program are determined and recorded in a hierarchical
structure called a tree.
• Often, a special kind of tree called a syntax tree is
used, in which each node represents an operation and
the children of a node represent the arguments of the
operation.
• For example, a syntax tree for an assignment statement
is shown in Figure below.
The Analysis-Synthesis Model of Compilation
18
Analysis of the source
program
Analysis consists of three phases:
• Linear/Lexical analysis
• Hierarchical/ analysis
analysis
19
1. Lexical analysis or Scanning
• Lexical Analyzer reads the source program character by character
and returns the tokens of the source program.
• A token describes a pattern of characters having collective or same
meaning in the source program. (such as identifiers, operators,
keywords, numbers, delimeters and so on)
• A lexical analyzer, also called a lexer or a scanner.
• It receives a stream of characters from the source program
and groups them into tokens.
 Blanks, new lines, tabulation marks will be removed during
lexical analysis.
20
Source
program
Lexical
analyzer
Streams of
tokens
Lexical Analysis
For example, in lexical analysis the characters in the assignment
statement position = initial + rate * 60 would be grouped into the
following tokens:
1. The identifier position.
2. The assignment symbol =
3. The identifier initial.
4. The plus sign
5. The identifier rate
6. The multiplication sign
7. The number 60
• Regular expressions are used to describe tokens (lexical constructs).
• A (Deterministic) Finite State Automaton can be used in the
implementation of a lexical analyzer.
21
Tokens
2. Syntax analysis or Parsing
• The parser receives the source code in the form of tokens
from the scanner and performs syntax analysis.
• The results of syntax analysis are usually represented by a
parse tree or a syntax tree.
• Syntax tree  each interior node represents an operation
and the children of the node represent the arguments of the
operation.
• The syntactic structure of a programming language is
determined by context free grammar (CFG).
22
Stream of
tokens
Syntax
analyzer
Abstract
syntax tree
Syntax analysis or Parsing…
• Example. Consider the parse tree of the following C code:
Parse tree for position = initial + rate * 60
23
Usually, the grammatical phrases of the source program are
represented by a parse tree.
3. Semantic analysis
 The semantics of a program are its meaning as opposed
to syntax or structure.
 The semantics consist of:
 Runtime semantics – behavior of program at runtime
 Static semantics – checked by the compiler
 Static semantics include:
 Declarations of variables and constants before use
 Calling functions that exist (predefined in a library or defined by
the user)
 Passing parameters properly
 Type checking.
 The semantic analyzer does the following:
 Checks the static semantics of the language
 Annotates the syntax tree with type information
24
Semantic analysis…
25
Ex. Consider again the following C code: position = initial + rate * 60
Figure of Semantic analysis inserts a conversion from integer to real
Synthesis of the target program
 Intermediate code generator
• The target code(program) generator
26
Intermediate code generator
 Comes after syntax and semantic analysis
 Intermediate representation should have 2 important
properties:
 Should be easy to produce
 Should be easy to translate into the target program
 Intermediate representation(IR) can have a variety of forms:
• Three-address code, Postfix notation, Tree or DAG representation
• The commonly used representation is three address formats .
27
Abstract syntax
Intermediate code
generator
Intermediate code
Three address code for the original C expression: The IR code for the given
input is as follows: :
temp1 = inttoreal ( 60 )
temp2 = id3 * temp1
temp3 = id2 + temp2
id1 = temp3
Code generator
• The machine code generator receives the (optimized)
intermediate code, and then it produces either:
– Machine code for a specific machine, or
– Assembly code for a specific machine and assembler.
• Code generator
– Selects appropriate machine instructions
– Allocates memory locations for variables
– Allocates registers for intermediate computations
28
Code generator…
• The code generator takes the IR code and generates code for the
target machine.
• Here we will write target code in assembly language:
position = initial + rate * 60
NB: The Input is IR of the previous three address code format
• Using registers R1 and R2,the translation of the given example is:
MOV id3 ,R2
MUL #60.0 , R2
MOV id2 , R1
ADD R2 , R1
MOV R1 , id1
29
Major Data Structures in a Compiler
• Token
 Represented by an integer value or an
enumeration literal
 Sometimes, it is necessary to preserve the string
of characters that was scanned
 For example, name of an identifiers or value of a
literal
• Syntax Tree
 Constructed as a pointer-based structure
 Dynamically allocated as parsing proceeds
 Nodes have fields containing information
collected by the parser and semantic analyzer 30
Major Data Structures in a Compiler…
• Symbol Table
 Keeps information associated with all kinds of
tokens:
• Identifier, Numbers, variables, fonctions,
paramètres, types, Fields, etc.
 Tokens are entered by the scanner and parser
Code generation and optimization phases use the
information in the symbol table
Performance Issues
 Insertion, deletion, and search operations need to
be efficient because they are frequent
More than one symbol table may be used
31
Chapter-1.pptx compiler Design Course Material

Chapter-1.pptx compiler Design Course Material

  • 1.
    Rift Valley UniversityHarar Campus Department of Computer science Course: Compiler Design. Course Code: CoSc3102. Credit hour: 3 Target group: BSc.in comp-Science: 3rd yr., II Sem(Weekend) Instructor: Gadisa A. (MSc).
  • 2.
    Delivery Mode • Lecture •Quiz • Group and Individual Assignments . • Team/Project Work
  • 3.
    Chapter 1 Outline • Introduction •Programs related to compiler • Phases of a Compiler • Analysis • Lexical analysis • Syntax analysis • Semantic analysis 3
  • 4.
    Introduction What is acompiler? • a program that reads a program written in one language (the source language) and translates it into an equivalent program in another language (the target language). • Why we study compiler construction techniques? • Compilers provide an essential interface between applications and architectures 4 Source program (High level language) Compile Target program Error messages (Assembly or machine language) Target program exe Input Output
  • 5.
    Programs related tocompilers These programs are Known as cousin of compiler-> means the context in which the compiler typically operates These are: Preprocessor, Interpreter, Assemblers, Linker, and Loader.  Pre-processors • A pre-processor is a separate program that is called by the compiler before actual translation begins. • The output preprocessors may be given as the input to compilers. 5
  • 6.
    Programs related tocompiler… Tasks performed by the preprocessors are: 1. Macro processing: A preprocessor may allow a user to define macros that are short hands for longer constructs. 2. File inclusion: A preprocessor may include header files into the program text. 3. Rational preprocessor: these preprocessors augment older languages with more modern flow-of control and data structuring facilities. 4. Language Extensions: These preprocessor attempts to add capabilities to the language by certain amounts to build-in macro. 6
  • 7.
    Programs related tocompilers Interpreter • Is a program that Works by analyzing and executing the source program commands one at a time • Does not translate the whole source program into object code • Interpretation is important when:  Programmer is working in interactive mode and needs to view and update variables  Running speed is not important  Commands have simple formats, and thus can be quickly analyzed and executed  Modification or addition to user programs is required as execution proceeds 7
  • 8.
    Programs related tocompilers  Interpreter and compiler 8 Source code Source code Exe code Machine Compilation Processing Intermediate code Interpreter Compilation Interpretation a) Compiler b) Interpreter NB: Compilers generate machine code, whereas interpreters interpret intermediate code
  • 9.
    Programs related tocompilers  differences between Interpreter and compiler 9  Interpreter takes one statement then translates it and executes it and then takes another statement.  Interpreter will stop the translation after it gets the first error.  Interpreter takes less time to analyze the source code.  Over all execution speed is less.  While compiler translates the entire program in one go and then executes it.  Compiler generates the error report after the translation of the entire program.  Compiler takes a large amount of time in analyzing and processing the high level language code.  Overall execution time is faster.
  • 10.
    E.g., Compiling JavaPrograms Java Program compiler Java bytecode Interpreter Win Mac Unix  The Java compiler produces bytecode not machine code  Bytecode is converted into machine code using a Java Interpreter  You can run bytecode on any computer that has a Java Interpreter installed 10
  • 11.
    Java 11 Java source programmay first be compiled into an intermediate form called bytecodes The bytecodes are then interpreted by a virtual machine.
  • 12.
    Programs related tocompiler…  Assemblers • Translator for the assembly language. • Assembly code is translated into machine code • Output is relocatable machine code.  Linker • Links object files separately assembled • Links object files to standard library functions • Generates a file that can be loaded and executed  Loader • Loading of the executable codes, which are the outputs of linker, into main memory. 12
  • 13.
    Programs related tocompiler…  Loader • Loading of the executable codes, which are the outputs of linker, into main memory. 13
  • 14.
    Programs related tocompiler . . . 14 Preprocessor Compiler C or C++ program C/C++ program with macro substitutions(expands) and file inclusions Assembler Assembly code Linker Relocatable object module Executable code Other relocatable object modules or library modules Loader Absolute machine code These are called: Program Execution or Language-processing System
  • 15.
    The Analysis-Synthesis Modelof Compilation • There are two parts to compilation: analysis and synthesis. • The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program. Lexical Analyzer, Syntax Analyzer and Semantic Analyzer are the parts of this phase. • The synthesis part constructs the desired target program from the intermediate representation. Intermediate Code Generator, Code Optimizer, and Code Generator are the parts of this phase. • Of the two parts, synthesis requires the most specialized technique. 15
  • 16.
    The Analysis-Synthesis Modelof Compilation  Analysis (front end) • Machine Independent/Language Dependent)  Synthesis (back end) • Machine Dependent/Language independent) 16
  • 17.
    The Phases ofa Compiler 17 Conceptually, a compiler operates in phases, each of which transforms the source program from one representation to another. The first three phases, forming the bulk of the analysis portion of a compiler. Two other activities, symbol-table management and error handling, are shown interacting with the six phases of lexical analysis, syntax analysis, semantic analysis, intermediate code generation , code optimization, and code generation.
  • 18.
    • During analysis,the operations implied by the source program are determined and recorded in a hierarchical structure called a tree. • Often, a special kind of tree called a syntax tree is used, in which each node represents an operation and the children of a node represent the arguments of the operation. • For example, a syntax tree for an assignment statement is shown in Figure below. The Analysis-Synthesis Model of Compilation 18
  • 19.
    Analysis of thesource program Analysis consists of three phases: • Linear/Lexical analysis • Hierarchical/ analysis analysis 19
  • 20.
    1. Lexical analysisor Scanning • Lexical Analyzer reads the source program character by character and returns the tokens of the source program. • A token describes a pattern of characters having collective or same meaning in the source program. (such as identifiers, operators, keywords, numbers, delimeters and so on) • A lexical analyzer, also called a lexer or a scanner. • It receives a stream of characters from the source program and groups them into tokens.  Blanks, new lines, tabulation marks will be removed during lexical analysis. 20 Source program Lexical analyzer Streams of tokens
  • 21.
    Lexical Analysis For example,in lexical analysis the characters in the assignment statement position = initial + rate * 60 would be grouped into the following tokens: 1. The identifier position. 2. The assignment symbol = 3. The identifier initial. 4. The plus sign 5. The identifier rate 6. The multiplication sign 7. The number 60 • Regular expressions are used to describe tokens (lexical constructs). • A (Deterministic) Finite State Automaton can be used in the implementation of a lexical analyzer. 21 Tokens
  • 22.
    2. Syntax analysisor Parsing • The parser receives the source code in the form of tokens from the scanner and performs syntax analysis. • The results of syntax analysis are usually represented by a parse tree or a syntax tree. • Syntax tree  each interior node represents an operation and the children of the node represent the arguments of the operation. • The syntactic structure of a programming language is determined by context free grammar (CFG). 22 Stream of tokens Syntax analyzer Abstract syntax tree
  • 23.
    Syntax analysis orParsing… • Example. Consider the parse tree of the following C code: Parse tree for position = initial + rate * 60 23 Usually, the grammatical phrases of the source program are represented by a parse tree.
  • 24.
    3. Semantic analysis The semantics of a program are its meaning as opposed to syntax or structure.  The semantics consist of:  Runtime semantics – behavior of program at runtime  Static semantics – checked by the compiler  Static semantics include:  Declarations of variables and constants before use  Calling functions that exist (predefined in a library or defined by the user)  Passing parameters properly  Type checking.  The semantic analyzer does the following:  Checks the static semantics of the language  Annotates the syntax tree with type information 24
  • 25.
    Semantic analysis… 25 Ex. Consideragain the following C code: position = initial + rate * 60 Figure of Semantic analysis inserts a conversion from integer to real
  • 26.
    Synthesis of thetarget program  Intermediate code generator • The target code(program) generator 26
  • 27.
    Intermediate code generator Comes after syntax and semantic analysis  Intermediate representation should have 2 important properties:  Should be easy to produce  Should be easy to translate into the target program  Intermediate representation(IR) can have a variety of forms: • Three-address code, Postfix notation, Tree or DAG representation • The commonly used representation is three address formats . 27 Abstract syntax Intermediate code generator Intermediate code Three address code for the original C expression: The IR code for the given input is as follows: : temp1 = inttoreal ( 60 ) temp2 = id3 * temp1 temp3 = id2 + temp2 id1 = temp3
  • 28.
    Code generator • Themachine code generator receives the (optimized) intermediate code, and then it produces either: – Machine code for a specific machine, or – Assembly code for a specific machine and assembler. • Code generator – Selects appropriate machine instructions – Allocates memory locations for variables – Allocates registers for intermediate computations 28
  • 29.
    Code generator… • Thecode generator takes the IR code and generates code for the target machine. • Here we will write target code in assembly language: position = initial + rate * 60 NB: The Input is IR of the previous three address code format • Using registers R1 and R2,the translation of the given example is: MOV id3 ,R2 MUL #60.0 , R2 MOV id2 , R1 ADD R2 , R1 MOV R1 , id1 29
  • 30.
    Major Data Structuresin a Compiler • Token  Represented by an integer value or an enumeration literal  Sometimes, it is necessary to preserve the string of characters that was scanned  For example, name of an identifiers or value of a literal • Syntax Tree  Constructed as a pointer-based structure  Dynamically allocated as parsing proceeds  Nodes have fields containing information collected by the parser and semantic analyzer 30
  • 31.
    Major Data Structuresin a Compiler… • Symbol Table  Keeps information associated with all kinds of tokens: • Identifier, Numbers, variables, fonctions, paramètres, types, Fields, etc.  Tokens are entered by the scanner and parser Code generation and optimization phases use the information in the symbol table Performance Issues  Insertion, deletion, and search operations need to be efficient because they are frequent More than one symbol table may be used 31