2. Compiler
⢠A compiler is a set of computer programs (software tool) that translates
statements written in high-level programming language into lower level
(machine) language.
⢠Example.
⢠C, C++, Java
Compiler
Error messages
Source
Program
Target
(Object Code)
Program
P.Kuppusamy - Compiler Design
3. Why Should We Study Compiler Design?
⢠A computer professional knows only about high-level programming and
computer hardware.
⢠The compiler connects these two components as a translator.
⢠Hence, Software Professionals should understand compilation
techniques for understanding how programming languages and
computers hang together.
P.Kuppusamy - Compiler Design
4. Applications of Compiler Technology
⢠Parsers for HTML in web browser
⢠Interpreters for javascript/flash
⢠Machine code generation for high level languages
⢠Software testing
⢠Program optimization
⢠Malicious code detection
⢠Design of new computer architectures
⢠Compiler-in-the-loop hardware development
⢠Hardware synthesis: VHDL (VHSIC (Very High-Speed Integrated Circuit) Hardware Description Language) to
RTL (register-transfer level) translation
⢠Compiled simulation
⢠Used to simulate designs written in VHDL
⢠No interpretation of design which increases the speed
P.Kuppusamy - Compiler Design
5. Complexity of Compiler Technology
⢠A compiler is the most complex system software.
⢠Writing the compiler is a substantial exercise in software engineering.
⢠The complexity is based on mapping a programmerâs requirements (in a HLL
program) to architectural details.
⢠It uses algorithms and techniques from computer science.
⢠Translates intricate theory into practical that enables tool building.
P.Kuppusamy - Compiler Design
6. Nature of Compiler Algorithms
⢠Draws results from mathematical logic, lattice theory, linear algebra, probability, etc.
⢠type checking, static analysis, dependence analysis and Loop parallelization, cache analysis, etc.
⢠Makes practical application of
⢠Greedy algorithms - register allocation
⢠Heuristic search - list scheduling
⢠Graph algorithms - dead code elimination (Unused code in program), register allocation
⢠Dynamic programming - instruction selection
⢠Optimization techniques - instruction scheduling
⢠Finite automata - lexical analysis
⢠Pushdown automata - parsing
⢠Fixed point algorithms - data-flow analysis
⢠Complex data structures - symbol tables, parse trees, data dependence graphs
P.Kuppusamy - Compiler Design
7. Additional Uses of Scanning and Parsing Techniques
⢠Assembler implementation
⢠Online text searching (GREP, AWK) and word processing
⢠Website filtering
⢠Command language interpreters
⢠Scripting language (control one or multiple applications without the need of compilation) interpretation (Unix shell, Perl,
Python)
⢠XML parsing and document tree construction
⢠Database query interpreters
P.Kuppusamy - Compiler Design
8. Additional Uses of Program Analysis Techniques
⢠Converting a sequential loop to a parallel loop
⢠Program analysis to determine if programs are data-race (concurrent access) free
⢠Profiling programs (measures the space or time complexity of a program, usage of particular instructions, frequency
and duration of function calls) determine busy regions
⢠Program slicing(Used in debugging, Testing) to locate the errors easily
⢠Data-flow analysis approach to software testing
⢠Uncovering errors along all paths
⢠Dereferencing null pointers
⢠Buffer overflows and memory leaks
⢠Worst Case Execution Time (time to execute the task on a hardware platform) estimation and energy analysis
P.Kuppusamy - Compiler Design
9. TRANSLATOR
⢠A translator is a program that takes as input a program written in one language and
produces as output a program in another language.
⢠Translator performs the error-detection.
⢠Ex: Compiler, Interpreter
⢠Important role of translator are:
1. Translating the HLL program input into an equivalent Machine Language program.
2. Providing diagnostic (error) messages wherever the programmer violates
specification of the HLL.
P.Kuppusamy - Compiler Design
10. (Programming) Language processing system
To create an executable target program, several components are required
Collect the source
Program ( modules, macros, etc.)
Preprocessor
Linker / Loader
Assembler
Target Assembly Program
Target machine code
Compiler
Relocatable machine code
Modified Source Program
Source Program
Library files
Relocatable object files
P.Kuppusamy - Compiler Design
11. Preprocessor
⢠A preprocessor produce input to compilers.
⢠It perform the following functions.
1. Macro processing: A preprocessor may allow a user to define macros that are short
hands for longer constructs. Ex. #define PI 3.14
2. File inclusion: A preprocessor may include header files into the program text. Ex.
#include <stdio.h>, #include âcustomize.h"
3. Rational preprocessor: These preprocessors augment (change) older languages with
more modern flow-of control and data structuring facilities.
4. Language Extensions: These preprocessors attempt to add capabilities to the
language by certain amounts to build-in macro.
Preprocessor
Modified Source Program
Source Program
P.Kuppusamy - Compiler Design
12. Structure of Compiler
⢠Translates a program written in (HLL) the source program into an equivalent
program in (MLL) the target program.
⢠An important part of a compiler is error showing to the programmer.
Compiler
Error messages
Source
Program
Target
Program
Input
Output
Target Assembly Program
Compiler
Modified Source Program
P.Kuppusamy - Compiler Design
13. Execution process of source program in Compiler
⢠Executing a program written in High Level programming language is
basically of two parts.
⢠The source program must be translated into a object program initially.
⢠Then the results object program is loaded into a memory is executed.
Source
Program Compiler
Obj (Target)
Program
Input
Memory
(Target
Program)
Output
P.Kuppusamy - Compiler Design
14. ASSEMBLER
⢠Difficult to write or read programs in machine language for professionals.
⢠So, mnemonic (symbols) are used for each machine instruction, which would subsequently translate into machine
language. Such a mnemonic machine language is called an assembly language.
⢠Assembler translates the assembly language in to machine language. The input to an assembler program is called
source program, the output is a machine language translation (Relocatable machine code / object program).
⢠The object code that not only contains machine level instructions but also information about hardware registers,
memory address of segment of the run-time memory (RAM), information about system resources, read-write
permissions, etc.
⢠If a program can get executed into loading in any portion of the RAM to run is called as relocatable. otherwise not
relocatable code.
Assembler
Target Assembly Program
Relocatable machine code
P.Kuppusamy - Compiler Design
15. INTERPRETER
⢠Directly executes instructions written in a programming without previously converting them to an object
code or machine code. Ex. Basic, Snobol, Lisp, Perl, Python, Matlab & also Java
⢠The process of interpretation can be carried out in following phases.
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Direct Execution
Interpreter
Source
Program
Input
Output
Error messages
Source
Program Compiler
Obj (Target)
Program
P.Kuppusamy - Compiler Design
16. INTERPRETER
Advantages:
⢠Portable
⢠Interpreter makes debugging simpler as it immediately checks the source
code.
⢠It uses less memory than an executable file because only a few lines of source
code needs to be in memory at a moment of time.
⢠The interpreter for the language makes it machine independent.
Disadvantages:
⢠The execution of the program is slower.
P.Kuppusamy - Compiler Design
17. LINKER / LOADER:
LINKER:
⢠The linker receives relocatable machine code and generates the executable target machine (object) code for
the program, and hand it over to the Loader.
⢠Source program may have some header files and library functions whose definition are stored in the built-
in libraries.
⢠The linker links these function to the built-in libraries. In case the built-in libraries are not found it informs to
the compiler, and the compiler then generates the error.
⢠Also the large programs are divided into the subprograms called modules which would be compiled,
assembled and the object modules will be generated.
⢠The linker is responsible for combining/linking all the object modules to generate a single executable file
of the source program.
Linker
Executable Target
Machine Code
Relocatable machine code
Built in libraries,
Relocatable modules
P.Kuppusamy - Compiler Design
18. LINKER / LOADER:
LOADER:
⢠The program to be executed currently must reside in the main memory of the
computer.
⢠Loader is a program in an operating system that loads the executable
file/module of a program to the main memory for execution.
⢠It allocates the storage space to the executable module in main memory.
Loader
Load for Execution
Executable Target Machine Code
Memory (RAM)
P.Kuppusamy - Compiler Design
19. LIST OF COMPILERS
⢠Ada compilers
⢠ALGOL compilers
⢠BASIC compilers
⢠C# compilers
⢠C compilers
⢠C++ compilers
⢠COBOL compilers
⢠Common Lisp compilers
⢠ECMAScript interpreters
⢠Fortran compilers
⢠Java compilers
⢠Pascal compilers
⢠PL/I compilers
⢠Python compilers
⢠Smalltalk compilers
P.Kuppusamy - Compiler Design
22. Working Principle
Scanner Parser
Semantic
Analyzer
Code
Generator
Optimizer
Source
Program Tokens Syntactic
Structure
(Tree)
Symbol and
Attribute
Tables
(Used by all Phases of The Compiler)
(Character Stream)
Intermediate
Representation
Target machine code
Intermediate
Code
Generator
Annotated Syntactic Structure
Optimized Intermediate
Representation
P.Kuppusamy - Compiler Design
23. Scanner
ď The scanner analyses the source program by reading character by
character.
ď Then grouping the characters into individual words and symbols
(tokens)
ďRemove the comment lines and white spaces
ďIt uses
o RE ( Regular expression )
o NFA ( Non-deterministic Finite Automata )
o DFA ( Deterministic Finite Automata )
o LEX (Parser Generator Program)
Working Principle
Scanner Parser
Semantic
Analyzer
Code
Generator
Optimizer
Source
Program Tokens Syntactic
Structure
(Character Stream)
Intermediate
Representation
Target machine code
Intermediate
Code
Generator
Annotated Syntactic Structure
Optimized Intermediate
Representation
P.Kuppusamy - Compiler Design
24. Working Principle
Parser (Syntax Analyzer)
ď The parser reads tokens and groups them into units as specified by the
productions of the CFG (context-free grammar).
ď As syntactic structure is recognized, the parser either calls corresponding
semantic analyzer directly or builds a syntax tree.
o CFG ( Context-Free Grammar )
o BNF ( Backus-Naur Form )
o GAA ( Grammar Analysis Algorithms )
o LL, LR (L -> left-to-right scanning of the input. R -> construct a right
most derivation in reverse), SLR (Simple LR), LALR (Look-Ahead LR)
Parsers
o YACC (Yet Another Compiler Compiler)
Scanner Parser
Semantic
Analyzer
Code
Generator
Optimizer
Source
Program Tokens Syntactic
Structure
(Character Stream)
Intermediate
Representation
Target machine code
Intermediate
Code
Generator
Annotated Syntactic Structure
Optimized Intermediate
Representation
P.Kuppusamy - Compiler Design
25. Semantic Analyzer (heart of a compiler)
ď Type checking, identifier declaration checking
ďCheck the static semantics of each construct i.e. each operator contains two
operands that is permitted by language.
ďParse tree is semantically (meaningful) verified.
ďTranslation Methods are
o Syntax Directed Translation
o Semantic Processing Techniques
o IR (Intermediate Representation)
Working Principle
Scanner Parser
Semantic
Analyzer
Code
Generator
Optimizer
Source
Program Tokens Syntactic
Structure
(Character Stream)
Intermediate
Representation
Target machine code
Intermediate
Code
Generator
Annotated Syntactic Structure
Optimized Intermediate
Representation
P.Kuppusamy - Compiler Design
26. Intermediate Code Generator (ICG)
ďMost popular format is Three address code
ďEvery statement (line) will contain Maximum Three address code only.
ďBut, less than three can be also allowed.
ďEx:
x = a + b * c - Contains 4 Addresses
It is reduce into 3 addresses by ICG.
y= b * c
x= a + y
Working Principle
Scanner Parser
Semantic
Analyzer
Code
Generator
Optimizer
Source
Program Tokens Syntactic
Structure
(Character Stream)
Intermediate
Representation
Target machine code
Intermediate
Code
Generator
Annotated Syntactic Structure
Optimized Intermediate
Representation
P.Kuppusamy - Compiler Design
27. Optimizer
ď The IR code is analyzed and transformed(optimized) into functionally
equivalent but reduce the number of lines.
ď The output runs faster and needs less space.
Working Principle
Scanner Parser
Semantic
Analyzer
Code
Generator
Optimizer
Source
Program Tokens Syntactic
Structure
(Character Stream)
Intermediate
Representation
Target machine code
Intermediate
Code
Generator
Annotated Syntactic Structure
Optimized Intermediate
Representation
P.Kuppusamy - Compiler Design
28. Code Generator
⢠Convert the optimized intermediate code into machine code i.e.
transform a code which an assembler can understand.
⢠Ex.
mul a, b
mov a, x
Working Principle
Scanner Parser
Semantic
Analyzer
Code
Generator
Optimizer
Source
Program Tokens Syntactic
Structure
(Character Stream)
Intermediate
Representation
Target machine code
Intermediate
Code
Generator
Annotated Syntactic Structure
Optimized Intermediate
Representation
P.Kuppusamy - Compiler Design
29. Symbol-table Management
⢠Data structure used to record information such as identifiers, attributes, type, etc.,
⢠Record the identifiers in source program
⢠Identifier is detected by lexical analysis and stored in symbol table
⢠Collect the attributes of identifiers
⢠Storage allocation (memory address), Types of attributes
⢠Scope (where it is valid, local or global)
⢠Arguments (in case of functions)
⢠Arguments numbers and types
⢠Call by reference or address
⢠Return types
⢠Semantic analysis uses type information to check the type of identifiers
⢠Code generating uses storage allocation information to generate proper relocation
address code P.Kuppusamy - Compiler Design
30. Error Detection and Reporting
⢠Lexical phase: could not form any token
⢠Syntax and semantic analysis handle a large fraction of errors
⢠Syntax phase: tokens violate structure rules (i.e. expressions, statements)
⢠Semantic phase: no meaning of operations
⢠Add an array name and a procedure name
P.Kuppusamy - Compiler Design
31. Grouping the Phases of Compiler
Analysis Phase
(FRONT END)
Synthesis Phase
(Back End)
P.Kuppusamy - Compiler Design
32. Grouping the Phases of Compiler
6 Phases divided into 2 groups.
⢠First three phases: analysis portion
⢠Analysis determines the operations implied by the source program which are recorded in a
tree structure i.e. Analyze the source program
⢠Last three phases: synthesis portion
⢠Synthesis takes the tree structure and translates the operations into the
target program i.e. Synthesize the machine language program
P.Kuppusamy - Compiler Design
33. Grouping the Phases of Compiler
Platform Independent
Phases
Platform
(Machine/
Language )
Dependent Phases
⢠To design New compiler, need not
to design all phases from scratch.
⢠We can use first 4 phases and
design last 2 phases.
P.Kuppusamy - Compiler Design
34. Example
Translation of Statement : position = initial + rate * 60
Scanner
[Lexical Analyzer]
Parser
[Syntax Analyzer]
Semantic Process
[Semantic analyzer]
Intermediate Code Generator
Code Optimizer
Tokens
Parse tree
Abstract Syntax Tree with Attributes
Non-optimized Intermediate Code
Optimized Intermediate Code
Target Code Generator
Target machine code
Three-address code
P.Kuppusamy - Compiler Design
35. Analysis Phases
⢠Lexical analysis
⢠Group characters into tokens
⢠Identifiers
⢠Keywords (if, while)
⢠Punctuations ( â(â ,â)â)
⢠Multi-character operator (â:=â)
⢠Enter lexical value (lexeme) into symbol table
⢠position, rate, initial
P.Kuppusamy - Compiler Design
36. Analysis Phases
⢠Syntax analysis
⢠Data structure for fig (a) is shown in fig (b).
⢠Semantic analysis
⢠Type checking and converting
P.Kuppusamy - Compiler Design
37. Synthesis Phase
Intermediate Code Generation
⢠Represent the source program for an abstract machine code
⢠Should be easy to produce
⢠Should be easy to translate into target program
⢠Three-address code (at most three operands)
⢠temp2:=id3 * temp1
⢠every memory location can act like a register
⢠temp2 ď BX
⢠BX is known as the base register.
Code Optimization
⢠Improve the intermediate code
⢠Faster-running machine code
temp1 :=id3*60.0
id1:=id2+temp1
Code Generation
⢠Generate relocation machine code or assembly code
MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R2, R1
MOVF R1, id1 P.Kuppusamy - Compiler Design
38. Cousins of the Compiler
⢠Preprocessors
⢠Macro processing
⢠File inclusion
#include <global.h> replace by file âglobal.hâ
⢠Rational preprocessors
⢠Language extensions
## query language embedded in C
Translated into procedure call
⢠Assemblers
Producing relocatable machine code
DW a #10 //initializes memory with one or more word (2-byte) values
DW b #20 // DW (Define Word)
MOV a, R1 // Load content of address a into R1
ADD #2, R1 //Add constant 2
MOV R1, b // Store R1 into address b
Preprocessor
Linker / Loader
Assembler
Target Assembly Program
Target machine code
Compiler
Relocatable machine code
Modified Source Program
Source Program
P.Kuppusamy - Compiler Design
39. Cousins of The Compiler
⢠Two-Pass Assembler
First pass
Find all identifiers,their storage location and store in symbol table
⢠Identifier Address
a 10
b 20
Second pass
Translate each operation code into the sequence of bits and Relocatable
machine code
⢠Loaders and Link-Editors
Link-editors
External references
Library file, routines by system, any other program
Loader - Taking and altering relocatable address machine codes
P.Kuppusamy - Compiler Design
41. References
⢠Principles of compiler design -A.V. Aho, J.D.Ullman; Pearson Education.
⢠Modern Compiler Implementation in C- Andrew N.Appel,
Cambridge University Press.
⢠Lex & yacc â John R. Levine, Tony Mason, Doug Brown, Oâreilly
P.Kuppusamy - Compiler Design