Compiler
Construction
LECTURE 3
Lecture Overview
Front-end
Parse Tree
Abstract Syntax Tree
Back-end
 Instruction Selection
Register allocation
Instruction Scheduling
RECOMMANDED TEXT BOOK: COMPILERS-PRINCIPLES TECHNIQUES AND TOOLS BY AHO, SETHI AND ULLMAN
Syntax Tree
 A parse can be represented by a tree
 parse tree or syntax tree
 x+2-y
goal
expr
term
op
expr
term
op
expr
term
– <id,y>
<id,x>
+ <number, 2>
RECOMMANDED TEXT BOOK: COMPILERS-PRINCIPLES TECHNIQUES AND TOOLS BY AHO, SETHI AND ULLMAN
Abstract Syntax Trees
The parse tree contains a lot of unneeded
information.
Compilers often use an abstract syntax tree
(AST).
RECOMMANDED TEXT BOOK: COMPILERS-PRINCIPLES TECHNIQUES AND TOOLS BY AHO, SETHI AND ULLMAN
Abstract Syntax Trees
This is much more concise
AST summarizes grammatical structure without the details of
derivation
ASTs are one kind of intermediate representation (IR)
–
<id,y>
<id,x> <number,2>
+
RECOMMANDED TEXT BOOK: COMPILERS-PRINCIPLES TECHNIQUES AND TOOLS BY AHO, SETHI AND ULLMAN
The Back End
Instruction
selection
IR machine
code
errors
Register
allocation
Instruction
scheduling
IR IR
RECOMMANDED TEXT BOOK: COMPILERS-PRINCIPLES TECHNIQUES AND TOOLS BY AHO, SETHI AND ULLMAN
The Back End
Translate IR into target machine code.
Choose machine (assembly) instructions to
implement each IR operation
Ensure conformance with system interfaces
Decide which values to keep in registers
RECOMMANDED TEXT BOOK: COMPILERS-PRINCIPLES TECHNIQUES AND TOOLS BY AHO, SETHI AND ULLMAN
The Back End
Instruction Selection:
Produce fast, compact code.
 Take advantage of target features such as addressing modes.
 Usually viewed as a pattern matching problem-dynamic programming.
Instruction
selection
IR machine
code
errors
Register
allocation
Instruction
scheduling
IR IR
RECOMMANDED TEXT BOOK: COMPILERS-PRINCIPLES TECHNIQUES AND TOOLS BY AHO, SETHI AND ULLMAN
The Back End
Instruction Selection:
Spurred by PDP-11 to VAX-11 - CISC.
RISC architecture simplified this problem.
Instruction
selection
IR machine
code
errors
Register
allocation
Instruction
scheduling
IR IR
RECOMMANDED TEXT BOOK: COMPILERS-PRINCIPLES
TECHNIQUES AND TOOLS BY AHO, SETHI AND ULLMAN
The Back End
Register Allocation:
Have each value in a register when it is used.
Manage a limited set of resources – register file.
Can change instruction choices and insert LOADs and STOREs.
Optimal register allocation is NP-Complete.
Instruction
selection
IR machine
code
errors
Register
allocation
Instruction
scheduling
IR IR
RECOMMANDED TEXT BOOK: COMPILERS-PRINCIPLES TECHNIQUES AND TOOLS BY A, SETHI AND ULLMAN
The Back End
Instruction Scheduling:
Avoid hardware stalls and interlocks.
Use all functional units productively.
Optimal scheduling is NP-Complete in nearly all cases.
Instruction
selection
IR machine
code
errors
Register
allocation
Instruction
scheduling
IR IR
RECOMMANDED TEXT BOOK: COMPILERS-PRINCIPLES TECHNIQUES AND TOOLS BY AHO, SETHI AND ULLMAN

Lecture_03.ppt

  • 1.
  • 2.
    Lecture Overview Front-end Parse Tree AbstractSyntax Tree Back-end  Instruction Selection Register allocation Instruction Scheduling RECOMMANDED TEXT BOOK: COMPILERS-PRINCIPLES TECHNIQUES AND TOOLS BY AHO, SETHI AND ULLMAN
  • 3.
    Syntax Tree  Aparse can be represented by a tree  parse tree or syntax tree  x+2-y goal expr term op expr term op expr term – <id,y> <id,x> + <number, 2> RECOMMANDED TEXT BOOK: COMPILERS-PRINCIPLES TECHNIQUES AND TOOLS BY AHO, SETHI AND ULLMAN
  • 4.
    Abstract Syntax Trees Theparse tree contains a lot of unneeded information. Compilers often use an abstract syntax tree (AST). RECOMMANDED TEXT BOOK: COMPILERS-PRINCIPLES TECHNIQUES AND TOOLS BY AHO, SETHI AND ULLMAN
  • 5.
    Abstract Syntax Trees Thisis much more concise AST summarizes grammatical structure without the details of derivation ASTs are one kind of intermediate representation (IR) – <id,y> <id,x> <number,2> + RECOMMANDED TEXT BOOK: COMPILERS-PRINCIPLES TECHNIQUES AND TOOLS BY AHO, SETHI AND ULLMAN
  • 6.
    The Back End Instruction selection IRmachine code errors Register allocation Instruction scheduling IR IR RECOMMANDED TEXT BOOK: COMPILERS-PRINCIPLES TECHNIQUES AND TOOLS BY AHO, SETHI AND ULLMAN
  • 7.
    The Back End TranslateIR into target machine code. Choose machine (assembly) instructions to implement each IR operation Ensure conformance with system interfaces Decide which values to keep in registers RECOMMANDED TEXT BOOK: COMPILERS-PRINCIPLES TECHNIQUES AND TOOLS BY AHO, SETHI AND ULLMAN
  • 8.
    The Back End InstructionSelection: Produce fast, compact code.  Take advantage of target features such as addressing modes.  Usually viewed as a pattern matching problem-dynamic programming. Instruction selection IR machine code errors Register allocation Instruction scheduling IR IR RECOMMANDED TEXT BOOK: COMPILERS-PRINCIPLES TECHNIQUES AND TOOLS BY AHO, SETHI AND ULLMAN
  • 9.
    The Back End InstructionSelection: Spurred by PDP-11 to VAX-11 - CISC. RISC architecture simplified this problem. Instruction selection IR machine code errors Register allocation Instruction scheduling IR IR RECOMMANDED TEXT BOOK: COMPILERS-PRINCIPLES TECHNIQUES AND TOOLS BY AHO, SETHI AND ULLMAN
  • 10.
    The Back End RegisterAllocation: Have each value in a register when it is used. Manage a limited set of resources – register file. Can change instruction choices and insert LOADs and STOREs. Optimal register allocation is NP-Complete. Instruction selection IR machine code errors Register allocation Instruction scheduling IR IR RECOMMANDED TEXT BOOK: COMPILERS-PRINCIPLES TECHNIQUES AND TOOLS BY A, SETHI AND ULLMAN
  • 11.
    The Back End InstructionScheduling: Avoid hardware stalls and interlocks. Use all functional units productively. Optimal scheduling is NP-Complete in nearly all cases. Instruction selection IR machine code errors Register allocation Instruction scheduling IR IR RECOMMANDED TEXT BOOK: COMPILERS-PRINCIPLES TECHNIQUES AND TOOLS BY AHO, SETHI AND ULLMAN

Editor's Notes

  • #3 We will discuss about the things that a parser uses. A parse can be represented by a tree: parse tree or syntax tree So we represent this parse by a tree (tree is a data structure)
  • #4 We will discuss about the things that a parser uses. A parse can be represented by a tree: parse tree or syntax tree (tree is a data structure) In last lecture we saw how do we derive a sentence by starting from the start symbol of grammar and by applying production rules . If by applying production rules, we reach the final sentence it means that sentence belong to the language of the given CFG. Parser is a computer program in which all these algorithms are implemented, and it does the same derivation to check whether the given sentence belongs to the language or not. We show the parse tree generated by a parser for the derivation of given sentence. We will do this as we did the derivation in last lecture. We start from goal we have seen that we can replace goal by expr using rule(goal→expr) and expr can be replaced by production rule (expr → expr op term) Now the expr on LHS will be replaced by rule (expr → expr op term). Now again the expr on LHS is replaced by term and term is replaced by <id, x> which represents the token that is actually a terminal that is derived by a parser. Now moving towards right the next node is op that is replaced by +, next term is replaced by <number, 2> , next the OP is replaced by – and term is replaced by <id,y>. This is the tree like structure. We do derivation in different ways. Revise various types of derivation. This parse tree has all information that was used during the derivation process (in last lecture we did derivation where the information given in Result column is used to build this tree) however, Parser does not need all this information related to derivation. Parsing process need less information.
  • #5 The parse tree contains a lot of unneeded information for the parsing point of view, including all Non terminals as intermediate nodes and terminals at the leaf nodes. This parse tree is not a balaced or binary tree. i.e. Nodes have varying number of childs. Therefore, another notation is used by parser that is abstract syntax tree (AST). That is shown in next slide
  • #6 This tree only show all terminal symbols. This tree captures the sentence x+2-y AST are one kind of intermediate representation (IR) We only have to check whether the given sentence can be derived by the given grammar or not and this AST provides this information, so we do not need the complete derivation information for the parser to complete its task. So from the parse tree a condensed version in the form of AST is generated. Parser will take the entire source file and check that all the sentences are legal sentence or not. Legal means that the compiler has all rules and grammar of a particular language and it has to check that all sentences that are written in the source file are syntactically correct or incorrect. Parser produces IR as output, this AST is a form of IR.
  • #7 Two pass compiler has Front-end and Back-end. Frond-end produces IR as input to Back-end IR is not an executable file that is needed by computer to run. An executable files contains 1’s and 0’s. these are made after assembling the assembly code that will be given to a Linker that will include some more libraries etc. and make an executable file. So the assembly language code that is required to produce an executable is generated by back end, that will be used to produce executable, so linker takes this object code + library object code and other object codes and provides to a Loader that eventually loads this executable in the memory where a process is created. It is important to note that if the assembly code generated by back end is not well optimized it can result in slow processing. Back-end has three major portions: Instruction selection, registration allocation, instruction scheduling. There are more things involved in backend that will be discussed later.
  • #8 The back end of the compiler translates IR into target machine code. It chooses machine (assembly) instructions to implement each IR operation. Assembly code is machine code and we use instruction set of the target machine. Assembly code gives more control over the machine which is not possible in high level languages. Back end has to take care not to access areas which can cause problems for the system. The back end ensure conformance with system interfaces. It decides which values to keep in registers in order to avoid memory access; memory access is far slower than register access. All of these issues are very important and need detailed discussion that will be done later. Or the time being its just overview of major tasks of backend
  • #9 Instruction selection: The goal of back end is to produce fast and compact code. Fast means code must execute quickly and the nlogn behavior of algorithms which we have studied in analysis of algorithm course, must reflect here. Along with time, memory is also an important resource and must be utilized efficiently. Compiler generates assembly code and the final executable also have to reside in memory, although it is not completly placed in memory. The executable is divided into pages and some pages are placed in memory and others reside on disk. The ultimate goal of backend is to generate compact code that should not occupy much space. Here the responsibility of compiler is to choose better assembly code (A task is done by 4 lines of code and if the same task can be done by 2 lines, compiler must choose two lines for compact code). Digital cameras and other machines like AC, refrigerators etc. have limited memory so executable must be compact. Usually, instruction selection is viewed as a pattern matching problem that can be solved by dynamic programming based algorithms. Instruction selection process is a pattern matching problem because if IR comes in the form of AST, we check some patterns, we see what is the structure of the tree how many nodes are there and what kind of nodes it has, i.e. if there are operator nodes what operands they have etc. how many levels are there and so on. We use dynamic programing and store results by trying different choices and inductively build up the final answer in bottom up fashion In assembly we have addressing ,codes, direct addressing, indirect addressing, etc. so it’s the responsibility of back end to choose suitable assembly code at this stage
  • #10 What was the reason behind the process of instruction selection? Instruction selection in compilers was spurred by the advent of the VAX-11 which had a CISC (Complex Instruction Set Computer) architecture. The VAX-11 had a large instruction set that the compiler could work with. In 1970s computers were held by big companies and they were very expensive (cost millions of dollars). The VAX-11 is a discontinued family of superminicomputers developed and manufactured by Digital Equipment Corporation (DEC) and announced in 1977, and were the first computers to implement the Virtual Address eXtension (VAX) instruction set architecture (ISA). They cost few hundred thousand dollars and hence were much cheaper then traditional mainframes etc. a popular series was programable data processor PDP 11 . In PDP 11 the assembly language had a comprehensive instruction set instead of having just load and store. They prepared a complex instruction set which allowed to perform many task through one instruction. They gave instructions for string comparison, bulk move (moving block of memory), etc. So CISC (complex instruction set computer) architecture was the requirement of compiler writers to have a comprehensive instruction set so instead of writing code for byte comparison they could use an assembly language instruction for comparing two strings. However, by introduction of complex instruction set the size of instructions was also increased and the task of compiler writers became more complex. So they developed different algorithms for backend, that will be discussed later. However due to increased instructions the development of compilers became difficult, and people thought about reducing the size of instruction set and making it simple, hence, RISC (reduced instruction set) architecture was introduced. So that compilers can generate compact code and fast. So because of the introduction of machine instruction set, this module (Instruction selection) was implemented in compilers and it was the responsibility of this module to choose appropriate machine instructions
  • #11 You have studied that among different units of CPU, one is register file. They are memory locations where CPU can store its results temporarily . We can load and store data to and from memory to registers. Registers work on CPU clock speed and load and store through registers is fast. Memory is much slower. So from programing point of view if we want to execute code fast, we will try to keep the values in registers while any computation. As a compiler write, we have IR of our program that could be written in any high level language and we have to generate a fast and compact code for this. To generate a fast code, we will keep values in register and avoid memory access so registers are a key to performance. In RISC architecture all activity is done in registers, that’s why in this architecture programs run very fast. So while generating assembly code compiler has to take care that if data can be placed in register for computation then it should do that. Registers are limited in computers and few are special purpose so it must be efficiently handled by computers to improve the performance. While declaring an int variable in C++ we can use the key word register with it (register int i,;) here we basically give a hint to compiler that if possible keep this value in register. The reason behind this is, this i can be used in a loop as array index and it will be accessed again and again so if it will be in register it will execute fast as compared to keeping this in memory(from memory it will be fetched and stored again and again- time consuming)
  • #12 Instruction Scheduling Modern processors have multiple functional units. The back end needs to schedule instructions to avoid hardware stalls and interlocks. The generated code should use all functional units productively. Optimal scheduling is NP-Complete in nearly all cases.