1.Explain the different phases of a Compiler.Compiler:Compiler is a program which translates a program written in one language (Source language) to an equivalent program in other language (the target language).Source program-CompilerTarget programCompiler is  a Software for translating high level language (HLL) to machine level language.It is nothing but a translator and it should know both the high level language and the architecture of the computer.Most of the compilers are machine dependant but some compilers are machine independent. Eg: java.Turbo C, common for both C & C++Need for a Compiler:We need Compilers, because the source program does not understand by the Computer. So, it has to convert into machine understandable language. So we use Compilers for this purpose.We cannot use the same compiler for all computers. Because every HLL has its own syntaxes.Phases of a Compiler:A Compiler takes as input a source program and produces as output an equivalent Sequence of machine instructions.This process is so complex that it is divided into a series of sub process called Phases.The different phases of a compiler are as followsAnalysis Phases :        1. Lexical Analysis                                     2. Syntax Analysis                                     3. Semantic AnalysisSynthesis Phases:      4.Intermediate Code generator                                      5. Code Optimization                                      6. Code generation.1. Lexical Analysis:It is the first phase of a Compiler. Lexical analyzer or Scanner reads the characters in the source program and groups them into a stream of tokens.The usual tokens are identifiers, keywords, Constants, Operators and Punctuation Symbols such as Comma and Parenthesis.Each token is a Sub-String of the source program that is to be treated as a single unit. Tokens are of two types:Specific Strings     Eg: If, Semicolon
Classes of Strings Eg: identifier, Constants, Label.A token is treated as a pair consisting of two parts.Token type
Token Value. The character sequence forming a token is called the lexeme for the token.Certain tokens will be increased by a lexical value. The lexical analyser not only generates a token, but also it enters the lexeme into the symbol table.            Symbol tablea
b
cToken values are represented by pairs in square brackets. The second component of the pair is an index to the symbol table where the information’s are kept. For eg., Consider the expressiona = b + c * 20After lexical Analysis it will be.id1 = id2 + id3 *20The lexical phase can detect errors where the characters remaining in the input do not form any token of the language. Eg: Unrecognized Keyword.2. Syntax Analysis:It groups tokens together into Syntactic Structures called an Expression.Expressions might further be combined to form statements.Often the syntactic structure can be regarded as a tree where leaves are tokens, called as parse trees.The parser has two functions. It checks if the tokens, occur in pattern that are permitted by the specification of the source language. Ie., Syntax checking.For eg., Consider the expire the each position A+/B. After lexical Analysis this will be, as the token sequence id+/id in the Syntax analyzer.On seeing / the syntax analyzer should detect an error. Because the presence of two adjacent binary operators violates the formulation rules.The second aspect is to make explicit the hierarchical Structure of incoming token stream by identifying which parts of the token stream should be  grouped.The Syntax analysis can detect syntax errors.  Eg., Syntax error.3.Semantic Analysis:An important role of semantic analysis is type checking.Here the computer checks that the each operator has operands that are permitted by the source language specification.Consider the eg:  x= a+bDiagramThe language specification may permit some operand coercions. For eg: When a binary arithmetic operator is applied to an integer and real. In this case, the compiler array need to convert the integer to a real.In this phase, the compiler detects type mismatch error.4. Intermediate Code generation:It uses the structure produced by the syntax analyzer to create a stream of simple instructions.Many styles are possible. One common style uses instruction with one operator and a small number of operands.The output of the previous phase is some representation of a parse tree. This phase transforms this parse tree into an intermediate language.One popular type of intermediate language is called Three address code.A typical three- address code statement is A = B op C.Where A, B, C are operands. OP is a binary Operator.Eg: A = B + c * 20Here, T1, T2, T3 are temporary variables. Id1, id2, id3 are the identifiers corresponding to A, B, C.5. Code Optimization:It  is designed to improve the intermediate code. So that the  Object program less space.Optimization may involve:1. Detection & removal of dead code.2. Calculation of constant expressions & terms.3. Collapsing of repeated expressions into temporary storage.4. Loop unrolling.5. Moving code outside the loops.6. Removal of unnecessary temporary-variables.For e.g.: A: = B+ C * 20 will beT1 = id3 * 20.0Id1 = id2 + T16. Code generation:Once optimizations are completed, the intermediate code is mapped into the target languages. This involves,Allocation of registers & memoryGeneration of connect references.Generation of correct types.Generation of machine code.Eg:   MOVF   id3, R2         MULF # 20.0, R2        MOVF id2, R1       ADDF R2, R1      MOVF R1, id1.2.Compiler Construction Tools:A number of tools have been developed for helping implement various phases of a compiler. Some useful compiler construction tools are a follow,1.Parser generators.2.Scanner generators.3.Syntax Directed Translation Engines.4.Automatic Code generators.5.Data flow engines.1.Parser Generators:These produce syntax analyzers, normally from input that is based on a context-free grammar.In early compilers, Syntax analysis not only consumed a large function of the running time of a compiler, but a large fraction of the interrectual effort of writing a compiler.This phase is very easy to implement. Many parser generations utilize powerful parsing algorithms that are too complex to be carried.2.Scanner Generators:These automatically generate lexical analyzers, normally from a specification based on regular expressions.The basic organization of the resulting lexical analyzer is in effect a finite automation.3.Syntax – Directed Translation Engines:These produce collection of routines that walk the parse tree such as intermediate code.The basic idea is that one or more translations are associated with each node of the parse tree.Each translation is defined in terms of translations at its neighbour nodes in the tree.4.Automatic Code Generators:This tool takes a collection of rules that define the translation of each operation of the intermediate language into the machine language for the target machine.The rules must include sufficient detail that we can handle the different possible access methods for data, Eg. Variables may be in registers, in a fixed location in memory or may be allocated a position on a stack.The basic technique is “template matching”. The intermediate code statements are replaced by templates.That  templates represent sequences of machine instructions.The assumptions about storage of variables match from template to template.5.Data Flow Engines:Much of the information needed to perform good code optimization involves “data flow analysis”.The gathering of information about how values are transmitted from one part of a program to each other part.3.Issues in the design of a Code Generator:Since the code generation phase is system dependent, the following issues arises during the code generation.Input to the code generator.
Target  Program.
Memory Management.
Instruction Selection.
Register Allocation.
Evaluation Order.1.Input to the code generator:It is an intermediate code that may be of several forms.Linear representation – Postfix notation
Three-address representation-quadruples
Virtual machine representation-stack machine code.
Graphical representation-syntax tree, dags.The intermediate language can be represented by quantities that the target machine can directly manipulate.By inserting type – conversion operations, the type – checking has to be taken place and the semantic errors have to be detected already. Thus the input to the code generator. Must be free of errors.2.Target Programs:The output of the code generator is the target program that may be of several forms.Absolute machine language.
Relocatable machine language

Compiler Design Material

  • 1.
    1.Explain the differentphases of a Compiler.Compiler:Compiler is a program which translates a program written in one language (Source language) to an equivalent program in other language (the target language).Source program-CompilerTarget programCompiler is a Software for translating high level language (HLL) to machine level language.It is nothing but a translator and it should know both the high level language and the architecture of the computer.Most of the compilers are machine dependant but some compilers are machine independent. Eg: java.Turbo C, common for both C & C++Need for a Compiler:We need Compilers, because the source program does not understand by the Computer. So, it has to convert into machine understandable language. So we use Compilers for this purpose.We cannot use the same compiler for all computers. Because every HLL has its own syntaxes.Phases of a Compiler:A Compiler takes as input a source program and produces as output an equivalent Sequence of machine instructions.This process is so complex that it is divided into a series of sub process called Phases.The different phases of a compiler are as followsAnalysis Phases : 1. Lexical Analysis 2. Syntax Analysis 3. Semantic AnalysisSynthesis Phases: 4.Intermediate Code generator 5. Code Optimization 6. Code generation.1. Lexical Analysis:It is the first phase of a Compiler. Lexical analyzer or Scanner reads the characters in the source program and groups them into a stream of tokens.The usual tokens are identifiers, keywords, Constants, Operators and Punctuation Symbols such as Comma and Parenthesis.Each token is a Sub-String of the source program that is to be treated as a single unit. Tokens are of two types:Specific Strings Eg: If, Semicolon
  • 2.
    Classes of StringsEg: identifier, Constants, Label.A token is treated as a pair consisting of two parts.Token type
  • 3.
    Token Value. Thecharacter sequence forming a token is called the lexeme for the token.Certain tokens will be increased by a lexical value. The lexical analyser not only generates a token, but also it enters the lexeme into the symbol table. Symbol tablea
  • 4.
  • 5.
    cToken values arerepresented by pairs in square brackets. The second component of the pair is an index to the symbol table where the information’s are kept. For eg., Consider the expressiona = b + c * 20After lexical Analysis it will be.id1 = id2 + id3 *20The lexical phase can detect errors where the characters remaining in the input do not form any token of the language. Eg: Unrecognized Keyword.2. Syntax Analysis:It groups tokens together into Syntactic Structures called an Expression.Expressions might further be combined to form statements.Often the syntactic structure can be regarded as a tree where leaves are tokens, called as parse trees.The parser has two functions. It checks if the tokens, occur in pattern that are permitted by the specification of the source language. Ie., Syntax checking.For eg., Consider the expire the each position A+/B. After lexical Analysis this will be, as the token sequence id+/id in the Syntax analyzer.On seeing / the syntax analyzer should detect an error. Because the presence of two adjacent binary operators violates the formulation rules.The second aspect is to make explicit the hierarchical Structure of incoming token stream by identifying which parts of the token stream should be grouped.The Syntax analysis can detect syntax errors. Eg., Syntax error.3.Semantic Analysis:An important role of semantic analysis is type checking.Here the computer checks that the each operator has operands that are permitted by the source language specification.Consider the eg: x= a+bDiagramThe language specification may permit some operand coercions. For eg: When a binary arithmetic operator is applied to an integer and real. In this case, the compiler array need to convert the integer to a real.In this phase, the compiler detects type mismatch error.4. Intermediate Code generation:It uses the structure produced by the syntax analyzer to create a stream of simple instructions.Many styles are possible. One common style uses instruction with one operator and a small number of operands.The output of the previous phase is some representation of a parse tree. This phase transforms this parse tree into an intermediate language.One popular type of intermediate language is called Three address code.A typical three- address code statement is A = B op C.Where A, B, C are operands. OP is a binary Operator.Eg: A = B + c * 20Here, T1, T2, T3 are temporary variables. Id1, id2, id3 are the identifiers corresponding to A, B, C.5. Code Optimization:It is designed to improve the intermediate code. So that the Object program less space.Optimization may involve:1. Detection & removal of dead code.2. Calculation of constant expressions & terms.3. Collapsing of repeated expressions into temporary storage.4. Loop unrolling.5. Moving code outside the loops.6. Removal of unnecessary temporary-variables.For e.g.: A: = B+ C * 20 will beT1 = id3 * 20.0Id1 = id2 + T16. Code generation:Once optimizations are completed, the intermediate code is mapped into the target languages. This involves,Allocation of registers & memoryGeneration of connect references.Generation of correct types.Generation of machine code.Eg: MOVF id3, R2 MULF # 20.0, R2 MOVF id2, R1 ADDF R2, R1 MOVF R1, id1.2.Compiler Construction Tools:A number of tools have been developed for helping implement various phases of a compiler. Some useful compiler construction tools are a follow,1.Parser generators.2.Scanner generators.3.Syntax Directed Translation Engines.4.Automatic Code generators.5.Data flow engines.1.Parser Generators:These produce syntax analyzers, normally from input that is based on a context-free grammar.In early compilers, Syntax analysis not only consumed a large function of the running time of a compiler, but a large fraction of the interrectual effort of writing a compiler.This phase is very easy to implement. Many parser generations utilize powerful parsing algorithms that are too complex to be carried.2.Scanner Generators:These automatically generate lexical analyzers, normally from a specification based on regular expressions.The basic organization of the resulting lexical analyzer is in effect a finite automation.3.Syntax – Directed Translation Engines:These produce collection of routines that walk the parse tree such as intermediate code.The basic idea is that one or more translations are associated with each node of the parse tree.Each translation is defined in terms of translations at its neighbour nodes in the tree.4.Automatic Code Generators:This tool takes a collection of rules that define the translation of each operation of the intermediate language into the machine language for the target machine.The rules must include sufficient detail that we can handle the different possible access methods for data, Eg. Variables may be in registers, in a fixed location in memory or may be allocated a position on a stack.The basic technique is “template matching”. The intermediate code statements are replaced by templates.That templates represent sequences of machine instructions.The assumptions about storage of variables match from template to template.5.Data Flow Engines:Much of the information needed to perform good code optimization involves “data flow analysis”.The gathering of information about how values are transmitted from one part of a program to each other part.3.Issues in the design of a Code Generator:Since the code generation phase is system dependent, the following issues arises during the code generation.Input to the code generator.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
    Evaluation Order.1.Input tothe code generator:It is an intermediate code that may be of several forms.Linear representation – Postfix notation
  • 11.
  • 12.
  • 13.
    Graphical representation-syntax tree,dags.The intermediate language can be represented by quantities that the target machine can directly manipulate.By inserting type – conversion operations, the type – checking has to be taken place and the semantic errors have to be detected already. Thus the input to the code generator. Must be free of errors.2.Target Programs:The output of the code generator is the target program that may be of several forms.Absolute machine language.
  • 14.