2. Overview
• Translators
• Need for Compiler
• History
• Analysis-Synthesis model of Compilation
• Phases of Compiler
• Bootstrapping of Compiler
• Passes
• Complier Construction Tools
2
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
3. Translators
• Programs accepting one form of language as input and
produce another form of language as output.
• A few translators are listed below.
• Compiler
• Interpreter
• Assembler
• Preprocessor
• Macro Processor
3
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
4. Translators
Assembler
Translates assembly language program into machine language
Uses mnemonic codes for operation codes and data addresses.
Preprocessor
A translator whose source language and object language are high
level language is termed as preprocessor.
C Preprocessor-begins with #
Macro processor
A system program that replaces the macro calls in an assembly
language program by the macro definition.
During this, formal arguments are converted into the actual
arguments. 4
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
5. Translators
Compiler
Translates a high level language (Source language) into a low
level language (Object language)
COBOL, FORTRAN to Assembly language or Machine
language.
Interpreter
Transforms a programming language into a simplified
language called intermediate code which can be directly
executed.
SNOBOL (String Name Oriented symBOlic Language ) to
Polish notation.
5
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
6. Compilers Vs Interpreters
6
• Avoid Compilation for
every Execution
• Requires less time for
compilation
• Reporting errors in a
whole
• Very complex to design
• Produce intermediate
code directly for execution
– can’t reuse
• Requires large time-
reducing the speed of
execution
• Reports error one-by-one
• Easy to design
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
7. Need for Compiler
Machine language:
Media that directly communicate with computer
Error prone
Assembly Language:
• Use mnemonic name for operation codes and data address
• Translator assembler helps it
Still it required how specific computer operates
High level programming:
Express algorithms in more natural notation
Without knowing how specific computer works
Translator compiler helps it
7
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
8. History of compilers
8
• Early computers written in Assembly language
• 1950 - Machine –independent programming
language were introduced - leads to the
development of compilers
• 1952 - first compiler was written by Grace Hopper
for A-0 Programming languages
• Translation is viewed as the compilation of a
sequence of machine language subprograms
selected from library
• 1957 - first complete compiler introduced by
FORTRAN team led by John Backus at IBM
• 1960 – COBOL compiled on multiple architectures
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
9. History of compilers
• 1962 – first self-hosting compiler – compiling its
own source code by Tim Hart and Mike Levin at
MIT for Lisp
• 1970 – Pascal and C Compilers written in their
own languages
• Building self-hosting compiler creates bootstrapping
problem – avoid by compiling either by a compiler in
different languages or running the compiler in different
languages
9
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
10. History of compilers
• 1984 – GNU C Compiler by Richard Stallman
• For Unix Operating system
• 1987 – first release of GCC compiler
• First portable ANSI C compiler as free software
• 1992 – version 2.0
• Ability to compile C++
• 1997 – compiler EGCS was created –
• to improve optimization
• 2001 – version 3.0 10
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
11. History of compilers
• 1995 – java compiler was introduced
• Major java compilers
• javac, included in JDK - open-sourced since nov 2006.
• GCJ - a part of gcc which compiles C, Fortran, Pascal and
other programming languages .
• ECJ - the Eclipse Compiler for Java, is an open source used
by the Eclipse JDT based on IBM Visual Age's Java compiler.
•
11
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
12. What is a Compiler?
• A compiler – translate human
oriented programming
language(source program) to
computer oriented machine
language (target program).
• A source program/code is a
program/code written in the
source language, which is usually a
high-level language.
• A target program/code is a
program/code written in the target
language, which often is a machine
language or an intermediate code.
12
compiler
Source
program
Target
program
Error
message
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
13. Analysis and Synthesis Model of
Compilation
13
Two parts of compilation
• Analysis
• breaks up the source program into
constituent pieces (words, phrases)
• creates an intermediate representation of
the source program.
• Synthesis
• performs code optimization,
• constructs the desired target program from
the intermediate representation.
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
14. Analysis and Synthesis Model of
Compilation
14
In Analysis
From the source
program generate three-
address code via tree
(An intermediate
representation)
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
15. Analysis and Synthesis Model of
Compilation
Synthesis
From intermediate
code –to sequence of
target instructions
15
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
16. Tools that Use the Analysis-
Synthesis Model
• Editors (syntax highlighting)
• Uses sequences of command to build the program
• Analyze the program text –and put them in
appropriate hierarchical structure
• Pretty printers (e.g. Doxygen)
• Analyze the program and print them in clearly visible
format
• Comments, indents
16
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
17. Tools that Use the Analysis-
Synthesis Model
• Static checkers (e.g. Lint and Splint)
• Analyze and identify the errors without running the
program
• Text formatters (e.g. TeX and LaTeX)
• Read input as stream of characters, tables, figures
• Specify a hierarchy of boxes to be filled by some bit pattern
17
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
18. Tools that Use the Analysis-
Synthesis Model
• Silicon compilers (e.g. VHDL)
• Input is conventional programming language
• Variable represents o’s or 1’s or signal
• Output is circuit
• Query interpreters/compilers (Databases)
• Translate Boolean and relational operators into
commands that enable search in databases
18
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
19. The phases of a compiler
• Six phases
• Scanner
• Parser
• Semantic Analyzer
• Intermediate Code Generator
• Code Optimizer
• Code generator
• Three auxiliary components
• Literal table
• Symbol table
• Error Handler
19
Compilation process divided into six sub process –
Phases
Take one form of representation of source program and
produce output as another form
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
20. The phases of a compiler
20
Scanner
Parser
Semantics Analyzer
Intermediate Code Generation
Code Optimizer
Code Generator
Literal
Table
Symbol
Table
Error
Handler
Source code
Syntax Tree
Annotated Tree
Intermediate code
Optimized code
Target code
Tokens
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
21. The phases of a compiler-
Scanner
Lexical analysis/Scanner:
it collects sequences of characters into meaningful
units called tokens
Regular expressions are useful notations to recognizes
tokens
Install class of a strings type token in symbol table and
return the token identified to the parser having token
type and the pointer to the symbol table
21
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
22. An example: a[index]=4+2
a identifier
[ left bracket
index identifier
] right bracket
= assignment
4 number
+ plus sign
2 number
Other operations: it may enter literals into the literal
table
22
The phases of a compiler-
Scanner
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
23. • Syntax analysis/Parser:
• it determines the structure of the program
• Accepts tokens –verify the tokens with patterns that
permitted by the specific language
• The results of syntax analysis are a parse tree or a
syntax tree
• An example: a[index]=4+2;
• Parse tree
• Syntax tree ( abstract syntax tree)
23
The phases of a compiler-Parser
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
24. The Syntax Tree
24
a index 4 2
identifier identifier number number
subscript-expression additive-expression
Assign-expression
The phases of a compiler-Parser
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
25. The semantics of a program are its “meaning”,
as opposed to its syntax, or structure, that
determines some of its running time behaviors prior
to execution.
Static semantics: declarations and type checking
Attributes: The extra pieces of information
computed by semantic analyzer
An example: a[index]=4+2
The syntax tree annotated with attributes
25
The phases of a compiler –Semantic
Analyzer
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
26. The Annotated Syntax Tree
26
a index 4 2
identifier identifier number number
subscript-expression additive-expression
integer integer
Assign-expression
array of integer integer integer integer
The phases of a compiler-Semantic
Analyzer
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
27. • An intermediate code generator
• takes a parse tree from the semantic analyzer
• generates a program in the intermediate language.
• In some compilers, a source program is translated into an
intermediate code first and then the intermediate code is
translated into the target language.
• In other compilers, a source program is translated directly
into the target language.
27
The phases of a compiler-Intermediate
Code Generation
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
28. • Using intermediate code is beneficial when compilers
which translates a single source language to many
target languages are required.
• One of the popular intermediate code is three-
address code. A three-address code instruction is in
the form of x = y op z.
28
The phases of a compiler-Intermediate
Code Generation
t = 4 + 2
a[index]=t
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
29. • Replacing an inefficient sequence of instructions with a
better sequence of instructions.
• Sometimes called code improvement.
• Code optimization can be done:
• after semantic analyzing
• performed on a parse tree
• after intermediate code generation
• performed on a intermediate code
• after code generation
• performed on a target code
29
The phases of a compiler-
Code Optimizer
An example: a[index]=4+2
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
30. 30
t = 4 + 2
a[index]=t
t = 6
a[index]=t
a[index]=6
The phases of a compiler-
Code Optimizer
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
31. • It takes the intermediate code or IR and generates
code for target machine
• The properties of the target machine become the
major factor:
• Using instructions and representation of data
• An example: a[index]=4+2
• Code sequence in a hypothetical assembly language
31
The phases of a compiler-
Code Generation
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
32. A possible code sequence
32
a[index]=6
MOV R0, index
MUL R0,2
MOV R1,&a
ADD R1,R0
MOV *R1,6
The phases of a compiler-
Code Generation
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
33. Static (or compile-time) errors must be reported
by a compiler
Generate meaningful error messages and resume
compilation after each error
Each phase of a compiler needs different kind of error
handing
Exception handling
Generate extra code to perform suitable runtime tests
to guarantee all such errors to cause an appropriate
event during execution.
33
BACK
Auxiliary components-
Error Handling
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
34. • Identifiers are names of variables, constants, functions, data
types, etc.
• Store information associated with identifiers
• Information associated with different types of identifiers can be
different
• Information associated with variables are name, type,
address,size (for array), etc.
• Information associated with functions are name,type of return
value, parameters, address, etc.
34
Auxiliary components-
Symbol Table
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
35. Auxiliary components- Symbol
Table
• Accessed in every phase of compilers
• The scanner, parser, and semantic analyzer put names of
identifiers in symbol table.
• The semantic analyzer stores more information (e.g.
data types) in the table.
• The intermediate code generator, code optimizer and
code generator use information in symbol table to
generate appropriate code.
• Mostly use hash table for efficiency.
35
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
36. • Store constants and strings used in program
• reduce the memory size by reusing constants and
strings
• Can be combined with symbol table
36
Auxiliary components-
Literal Table
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
37. Passes
The repetitions to process the entire source program
before generating code are referred as passes.
Passes may or may not correspond to phases
A pass often consists of several phases
A compiler can be one pass, which results in efficient compilation
but less efficient target code
Most compilers with optimization use more than one pass
One Pass for scanning and parsing
One Pass for semantic analysis and source-level optimization
The third Pass for code generation and target-level optimization
37
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
38. Bootstrapping
38
• Bootstrapping answers - “How the first compiler (a
translator program) was compiled”.
• Compiler is characterized by
• it’s source language
• Object language
• The language in which the compiler is written.
The notation CS
LA refers
the compiler C written in the language S to compile
the language L to produce the object language A.
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
39. Bootstrapping
• First Step: Write the compiler for the language L for the
machine A with object language A (CA
LA )
• is write a small compiler CA
SA that translates a subset S of the
language L into the machine language A.
• This is written in the machine language of the machine A.
• In the second step, write a compiler for L in the language S for
which the compiler is already written. (CS
LA )
• In the third step compile the compiler program written in S for L
using the compiler for S.
39
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
40. Bootstrapping
• Using the compiler CA
LA it will be possible to write
compiler for any machine.
• A compiler running on one machine, producing object
code for another machine is called Cross Compiler.
40
CA
SA
CS
LA CA
LA
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
41. Compiler Construction Tools
• The software producing compilers on giving the specification
of the source language and the target machine
• Some of the compiler writing tools are compiler-compilers,
compiler generators, translator writing systems.
• The compiler writing tools need the following information.
• Description of the lexical and syntactic structure of the source
language.
• Description of what output is to be generated for each source
language constructs.
• Description of the target machine.
41
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
42. Compiler Construction Tools
The principal aids provided by the existing compiler – compilers are
Scanner Generator
• Transition diagram is the formal tool used for generating scanners.
• The transition diagrams are constructed using the regular expressions
defined to recognizing the various types of tokens.
Parser Generator
• The parsers are normally non-procedural and written using Context-
free grammars.
• The context free grammars are quite easy to specify the syntactic
structure of any language
• Highly reliable and easy for any correction.
•
42
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC
43. Compiler Construction Tools
Facilities for code generation
• Code generation is done by writing high level language
routines and calling them at the right time.
• The selection of the routine and generating the code can be
done with the help of decision tables.
43
Mrs.D.Jena
Catherine
Bel,
AP/CSE,
VEC