306MTAMount UCLA University Bachelor's Diploma in Social Media
System software module 4 presentation file
1. MODULE 4
Compilers - Phases of a compiler - Lexical, Syntax,
Intermediate code generation, Optimization, Code
generation, Symbol table and error correcting
routines – Passes of a compiler.
2. What is a Compiler?
• A compiler is a computer program that decodes computer code
composed in one programming language into another language.
• Or we can say that the compiler helps in translating the source code
composed in a high-level programming language into the machine
code.
3. • A program which is input to the compiler is called a Source program.
This program is now converted to a machine level language by a
compiler is known as the Object code.
4. Compiler Architecture
A compiler can broadly be divided into two phases based on the way they
compile.
1. Analysis Phase
• Known as the front-end of the compiler, the analysis phase of the compiler
reads the source program, divides it into core parts and then checks for
lexical, grammar and syntax errors.
• The analysis phase generates an intermediate representation of the source
program and symbol table, which should be fed to the Synthesis phase as
input.
2. Synthesis Phase
• Known as the back-end of the compiler, the synthesis phase generates the
target program with the help of intermediate source code representation and
symbol table.
5. • A compiler can have many phases and passes.
Pass: A pass refers to the traversal of a compiler through the entire
program.
Phase: A phase of a compiler is a distinguishable stage, which takes input
from the previous stage, processes, and yields output that can be used as
input for the next stage. A pass can have more than one phase.
7. 1. Lexical Analysis:
• Lexical analysis or Lexical analyzer is the initial stage or phase of the
compiler. This phase scans the source code and transforms the input
program into a series of a token.
• A token is the arrangement of characters that defines a unit of
information in the source code.
• NOTE: In computer science, a program that executes the process of
lexical analysis is called a scanner, tokenizer, or lexer.
8. Roles and Responsibilities of Lexical Analyzer
• It is accountable for terminating the comments and white spaces
from the source program.
• It helps in identifying the tokens.
• Categorization of lexical units.
9. 2. Syntax Analysis:
• In the compilation procedure, the Syntax analysis is the second stage.
• Here the provided input string is scanned for the validation of the
structure of the standard grammar.
• Basically, in the second phase, it analyses the syntactical structure and
inspects if the given input is correct or not in terms of programming
syntax.
• It accepts tokens as input and provides a parse tree as output. It is
also known as parsing in a compiler.
10. • Parse: It means to resolve (a sentence) into its parts and describe
their syntactic roles.
• Tree: A tree structure, with a root value and sub-trees
Parse Tree:
• Parse tree is the hierarchical representation of terminals or non-
terminals.
• These symbols (terminals or non-terminals) represent the derivation
of the grammar to yield input strings.
• The starting symbol of the grammar must be used as the root of the
Parse Tree.
• Leaves of parse tree represent terminals.
• Each interior node represents productions of a grammar.
11. Roles and Responsibilities of Syntax Analyzer
• Note syntax errors.
• Helps in building a parse tree.
• Acquire tokens from the lexical analyzer.
• Scan the syntax errors, if any.
12. 3. Semantic Analysis:
• In the process of compilation, semantic analysis is the third phase.
• It scans whether the parse tree follows the guidelines of language.
• Semantic Analysis makes sure that declarations and statements of the
program are semantically correct.
• Both the syntax tree of the previous phase and the symbol table are
used to check the consistency of the given code.
• Type checking is an important part of semantic analysis where the
compiler makes sure that each operator has matching operands.
• It also helps in keeping track of identifiers and expressions.
13. • In simple words, we can say that a semantic analyzer defines the
validity of the parse tree, and the annotated syntax tree comes as an
output.
Roles and Responsibilities of Semantic Analyzer:
• Saving collected data to symbol tables or syntax trees.
• It notifies semantic errors.
• Scanning for semantic errors.
14. 4. Intermediate Code Generation:
• The parse tree is semantically confirmed; now, an intermediate code
generator develops three address codes.
• A middle-level language code generated by a compiler at the time of
the translation of a source program into the object code is known as
intermediate code or text.
Few Important Pointers:
• A code that is neither high-level nor machine code, but a middle-level
code is an intermediate code.
• We can translate this code to machine code later.
• This stage serves as a bridge or way from analysis to synthesis
15.
16. Roles and Responsibilities:
• Helps in maintaining the priority ordering of the source language.
• Translate the intermediate code into the machine code.
• Having operands of instructions.
17. 5. Code optimizer:
• Now coming to a phase that is totally optional, and it is code
optimization.
• It is used to enhance the intermediate code.
• This way, the output of the program is able to run fast and consume
less space.
• To improve the speed of the program, it eliminates the unnecessary
strings of the code and organises the sequence of statements.
18. Roles and Responsibilities:
• Remove the unused variables and unreachable code.
• Enhance runtime and execution of the program.
• Produce streamlined code from the intermediate expression.
19. 6. Code Generator:
• The final stage of the compilation process is the code generation
process.
• In this final phase, it tries to acquire the intermediate code as input
which is fully optimised and map it to the machine code or language.
• Later, the code generator helps in translating the intermediate code
into the machine code.
Roles and Responsibilities:
• Translate the intermediate code to target machine code.
• Select and allocate memory spots and registers.
20. Symbol Table
• It is an important data structure created and maintained by the
compiler in order to keep track of semantics of variables.
• It stores information about the scope and binding information about
names, information about instances of various entities such as
variable and function names, classes, objects, etc.
• It is built-in lexical and syntax analysis phases.
• The information is collected by the analysis phases of the compiler
and is used by the synthesis phases of the compiler to generate code.
• It is used by the compiler to achieve compile-time efficiency.
21. It is used by various phases of the compiler as follows:-
1) Lexical Analysis: Creates new table entries in the table, for example like
entries about tokens.
2) Syntax Analysis: Adds information regarding attribute type, scope,
dimension, line of reference, use, etc in the table.
3) Semantic Analysis: Uses available information in the table to check for
semantics i.e. to verify that expressions and assignments are semantically
correct(type checking) and update it accordingly.
4) Intermediate Code generation: Refers symbol table for knowing how much
and what type of run-time is allocated and table helps in adding temporary
variable information.
5) Code Optimization: Uses information present in the symbol table for
machine-dependent optimization.
6) Target Code generation: Generates code by using address information of
identifier present in the table.
22. Symbol Table stores:
a. It stores the literal constants and strings.
b. It helps in storing the function names.
c. It also prefers to store variable names and constants.
d. It stores labels in source languages.
23. Advantages of Symbol Table
• The efficiency of a program can be increased by using symbol tables, which
give quick and simple access to variables, function names, data kinds, and
memory locations.
• Symbol tables can be used to organize and simplify code.
• Faster code execution: By offering quick access to information like memory
addresses, optimize code execution by lowering the number of memory
accesses required during execution.
• Symbol tables can be used to increase the portability of code and make it
simpler to migrate code between other systems or programming
languages.
• Improved code reuse: symbol tables can be utilized to increase the reuse of
code across multiple projects.
• Symbol tables can be used to facilitate easy access to and examination of a
program’s state during execution, enhancing debugging by making it
simpler to identify and correct mistakes.
24. Implementation
• If a compiler is to handle a small amount of data, then the symbol
table can be implemented as an unordered list, which is easy to code,
but it is only suitable for small tables only.
• A symbol table can be implemented in one of the following ways:
Linear (sorted or unsorted) list
Binary Search Tree
Hash table
• Among all, symbol tables are mostly implemented as hash tables,
where the source code symbol itself is treated as a key for the hash
function and the return value is the information about the symbol.
25. Error detection and Recovery in Compiler
• In this phase of compilation, all possible errors made by the user are
detected and reported to the user in the form of error messages.
• This process of locating errors and reporting them to users is called
the Error Handling process.
Functions of an Error handler.
oDetection
oReporting
oRecovery
26. Classification of Errors
Lexical phase errors
• These errors are detected during the lexical analysis phase. Typical lexical
errors are:
• Exceeding the length of identifier or numeric constants.
• The appearance of illegal characters
• Unmatched string
27. Syntactic phase errors:
• These errors are detected during the syntax analysis phase. Typical
syntax errors are:
• Errors in structure
• Missing operator
• Misspelled keywords
• Unbalanced parenthesis
Semantic errors
• These errors are detected during the semantic analysis phase. Typical
semantic errors are
• Incompatible type of operands
• Undeclared variables
• Not matching of actual arguments with a formal one
29. Disadvantages:
Slower compilation time
Increased complexity
Risk of silent errors.
Potential for incorrect recovery
Dependency on the recovery mechanism
Difficulty in diagnosing errors
Compatibility issues
30. Error Handling in Compiler Design
• The tasks of the Error Handling process are to detect each error,
report it to the user, and then make some recovery strategy and
implement them to handle the error.
• Error handler=Error Detection+Error Report+Error Recovery
• An Error is the blank entries in the symbol table.
• Errors in the program should be detected and reported by the parser.
• Whenever an error occurs, the parser can handle it and continue to
parse the rest of the input.
• Although the parser is mostly responsible for checking for errors,
errors may occur at various stages of the compilation process.
31. Compiler Passes
• Pass is a complete traversal of the source program.
• Compiler passes are of two types Single Pass Compiler, and Two Pass
Compiler or Multi-Pass Compiler.
Multi-pass Compiler
• Multi pass compiler is used to process the source code of a program several
times.
• In the first pass, compiler can read the source program, scan it, extract the
tokens and store the result in an output file.
• In the second pass, compiler can read the output file produced by first pass,
build the syntactic tree and perform the syntactical analysis. The output of
this phase is a file that contains the syntactical tree.
• In the third pass, compiler can read the output file produced by second pass
and check that the tree follows the rules of language or not. The output of
semantic analysis phase is the annotated tree syntax.
• This pass is going on, until the target output is produced.
32. One-pass Compiler
• One-pass compiler is used to traverse the program only once.
• The one-pass compiler passes only once through the parts of each
compilation unit.
• It translates each part into its final machine code.
• In the one pass compiler, when the line source is processed, it is scanned
and the token is extracted.
• Then the syntax of each line is analyzed and the tree structure is build. After
the semantic part, the code is generated.
• The same process is repeated for each line of code until the entire program
is compiled.
• Single pass compiler is faster and smaller than the multi-pass compiler.
• A disadvantage of a single-pass compiler is that it is less efficient in
comparison with the multipass compiler.
33. One pass Two-pass
It performs Translation in one pass It performs Translation in two pass
It scans the entire file only once. It requires two passes to scan the source file.
It generates Intermediate code It does not generate Intermediate code
It is faster than two pass assembler It is slower than two pass assembler
A loader is not required A loader is required.
No object program is written. A loader is required as the object code is generated.
Perform some professing of assembler directives. Perform processing of assembler directives not done in pass-1
The data structure used are:
The symbol table, literal table, pool table, and table of incomplete.
The data structure used are:
The symbol table, literal table, and pool table.
These assemblers perform the whole conversion of assembly code to
machine code in one go.
These assemblers first process the assembly code and store values in
the opcode table and symbol table and then in the second step they
generate the machine code using these tables.
Example: C and Pascal uses One Pass Compiler. Example: Modula-2 uses Multi Pass Compiler.