Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Compiler construction

3,193 views

Published on

Compiler Construction

Phases of a compiler
Analysis and synthesis phases
-------------------
-> Compilation Issues
-> Phases of compilation
-> Structure of compiler
-> Code Analysis

Published in: Software
  • Be the first to comment

Compiler construction

  1. 1. Jeena Thomas, Asst Professor, CSE, SJCET Palai 1
  2. 2. » Phases of a compiler » Analysis and synthesis phases Jeena Thomas, Asst Professor, CSE, SJCET Palai 2
  3. 3. » A compiler is a kind of translator. TRANSLATORSoftware that accepts text in certain language (SOURCE LANGUAGE) Text in another language ,preserving the meaning of text (TARGET/OBJECT LANGUAGE) Jeena Thomas, Asst Professor, CSE, SJCET Palai 3
  4. 4. » A translator, is a generalized form of compiler. » When the object language is a low level language, such a translator is called a compiler. » This conversion process is essential for the hardware to interpret and perform the semantics of the input program. » As an important part of this translation process, the compiler reports to its user the presence of errors in source program. Jeena Thomas, Asst Professor, CSE, SJCET Palai 4
  5. 5. Jeena Thomas, Asst Professor, CSE, SJCET Palai 5
  6. 6. » Compiler is a program written in source language and translates it into an equivalent target language. Jeena Thomas, Asst Professor, CSE, SJCET Palai 6
  7. 7. » Source code » a=(b+c)*(b+c)*2 Target code MOV b,R2 ADD R2,c MUL R2,R2 MUL R2, #2.0 MOV R2,a 7 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  8. 8. » FORTRAN compilers of the late 1950s » 18 person-years to build 8 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  9. 9. » Writing a compiler gives a student experience with large- scale applications development. Your compiler program may be the largest program you write as a student. Experience working with really big data structures and complex interactions between algorithms will help you out on your next big programming project. » Compiler writing is one of the shining triumphs of CS theory. It demonstrates the value of theory over the impulse to just "hack up" a solution. » Compiler writing is a basic element of programming language research. Many language researchers write compilers for the languages they design. » Many applications have similar properties to one or more phases of a compiler, and compiler expertise and tools can help an application programmer working on other projects besides compiler Jeena Thomas, Asst Professor, CSE, SJCET Palai 9
  10. 10. » Throughout the 1950’s, compilers were considered difficult programs to write. » The first Fortan compiler took 18 staff-years o implement. » Good implementation languages, programming environments, and software tool has been developed as the systematic techniques for handling many of important tasks that occur during compilation. » With these advances, a substantial compiler can be implemented even as a student project in a one- semester compiler-design course. Jeena Thomas, Asst Professor, CSE, SJCET Palai 10
  11. 11. » is more broadly applicable and has been employed in rather unexpected areas. » Text-formatting languages, preprocessor packages » Silicon compiler for the creation of VLSI circuits » Command languages of OS » Query languages of Database systems 11 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  12. 12. » Hierarchy of operations to be maintained -to determine the correct order of evaluation of the expressions.  Maintaining data type integrity -each part of complex expression can be made of different types.  Compiler as prior knowledge about the nature of user defined data types. - struct, enum, union,  Appropriate storage mappings for data structures - allocation of memory for data Jeena Thomas, Asst Professor, CSE, SJCET Palai 12
  13. 13. » The compiler must resolve the occurrence of each variable name in a program to determine the name space to which a referenced variable belongs to.(Symbol table) » Compiler should have facilities to handle different control structures like ‘if-then-else’, ‘for’, ‘while’ etc. The compiler should have the facilities to increment the loop variable and terminate the loop. 13 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  14. 14. » Process of compilation is highly complex, it is split into a series of subprocesses called phases. » A phase is a logically cohesive operation that takes as input one representation of source program and produces as output another representation. » Activities of compilation split into two parts 1) Analysis part 2) Synthesis part 14 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  15. 15. 15 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  16. 16. » Analysis of source program » is done by the front end of compiler » It determines meaning of source string. » Synthesis of target program » Is done by the back end of the compiler. » An equivalent target string is constructed from the output given by the front end of compiler. 16 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  17. 17. » In compiling, analysis has three phases: » Linear analysis: stream of characters read from left-to-right and grouped into tokens; known as lexical analysis or scanning » Hierarchical analysis: tokens grouped hierarchically with collective meaning; known as parsing or syntax analysis » Semantic analysis: check if the program components fit together meaningfully 17 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  18. 18. » Optimization of code » Allocation of memory » Generation of code 18 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  19. 19. Jeena Thomas, Asst Professor, CSE, SJCET Palai 19
  20. 20. 20 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  21. 21. » Performs the linear analysis on the source program. » It reads a stream of characters making up the source program from left to right and groups them into tokens. » A token is defined as a sequence of characters that have a collective meaning. » For each token identified, this phase also determines the category of the token as identifier, constant or reserved words and its attribute that identifies the symbol’ position in the symbol table 21 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  22. 22. 22 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  23. 23. » Identifiers are names of variables, constants, functions, data types, etc. » Store information associated with identifiers » Information associated with different types of identifiers can be different » Information associated with variables are name, type, address, size (for array), etc. » Information associated with functions are name , type of return value, parameters, address, etc. 23 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  24. 24. » Consider the following statement » a=(b+c)*(b+c)*2--------------------------------(1) 24 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  25. 25. Symbol Category Attribute a Identifier #1 = operator Assignment(1) b Identifier #2 + operator Arithmetic(1) c Identifier #3 * operator Arithmetic(2) ( operator Open parenthesis(1) ) operator Closed parenthesis(1) 2 Constant #4 25 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  26. 26. » This phase performs hierarchical analysis on the source program. » Here, the tokens are grouped into hierarchically nested collections with collective meaning called expressions or statements. » It determines structure of source language. » Represents the grammar / syntax of the language. » These grammatical phrases are represented in the form of parse tree. 26 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  27. 27. » Describes the syntactic structure of input » The terminal nodes represent the tokens and interior nodes represent the expressions. 27 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  28. 28. » Syntactic structures also represented using syntax trees. » A syntax tree is a compressed representation of the parse tree, where the operators appear as interior nodes and operands for this operator as their children 28 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  29. 29. » Syntax tree is a compressed representation of a parse tree. » The interior node in a syntax tree represent an operator, whereas the interior nodes in a parse tree represent an expression. » The leaf node of a syntax tree represent the operand, whereas leaf node in a parse tree represent the tokens. 29 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  30. 30. » Goal- is to determine the meaning of a source string. » It checks the source program for semantic errors and gathers the type of information that can be used in subsequent phases of compilation. » Type checking for operations also performed during this phase. » Output- Annotated tree 30 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  31. 31. » It is a part of the synthesis process of the compiler. » The intermediate code is the representation for an abstract machine. » Using the intermediate code, optimization and code generation can be performed. 31 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  32. 32. » It should be easily generated from semantic representation of the source program. » It should be easy to translate the intermediate code to target language. » It should be capable of holding the values computed during translation. » It should maintain precedence ordering of the source language. » It should be capable of holding the correct number of operands of the instruction. 32 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  33. 33. » T1=intoreal(2) » T2=b+c » T3=b+c » T4=T2*T3 » T5=T4*T1 » a=T5 33 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  34. 34. » The main aim of this phase is to improve on the intermediate code to generate a code that runs faster and/or occupies less space. » It is used to establish trade off between compilation speed and execution speed. 34 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  35. 35. » T1=inttoreal(2) » T2=b+c » T3=T2*T2 » T4=T3*T1 » a=T4 35 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  36. 36. » The main aim of this phase is to allocate storage and generate a relocatable machine/ assembly code. » Memory locations and registers are allocated for variables. » The instructions in intermediate code format are converted into machine instructions. 36 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  37. 37. » MOV R2, b » ADD R2,c » MUL R2,R2 » MUL R2, #2.0 » MOV R2, a 37 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  38. 38. » The compiler also attempts to improve the target code generated by the code generator by choosing proper addressing modes to improve the performance, replacing slow instructions by fast ones and eliminating redundant instructions. » MUL R2, #2.0-------------------SHL(Shift Left Instruction) 38 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  39. 39. » MOV b, R2 » ADD R2,c » MUL R2,R2 » SHL R2 » MOV R2, a 39 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  40. 40. » 1.)Symbol Table Management » 2.) Literal Table Management » 3.) Error Detection and Reporting 40 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  41. 41. » A symbol table is a data structure that contains a record for each identifier with fields for the attributes of the identifier. » This data structure has facilities to manipulate(add/delete) the elements in it. » The type information about the identifier is detected during lexical analysis phase and is entered into the symbol table. » This information is used during the intermediate code generation and code generation phases of compiler to verify type information. 41 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  42. 42. 42 Jeena Thomas, Asst Professor, CSE, SJCET Palai Address Symbol Attribute Memory Location 1 A id,real 1000 2 B id,real 1100 3 C id,real 1110
  43. 43. » literal table maintains the details of constants and strings used in the program. » It reduces the size of a program in memory by allowing reuse of constants and strings. » It is also needed by the code generator to construct symbolic addresses for literals. 43 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  44. 44. Address Literal Attribute Memory Location 4 2 const,int 1200 44 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  45. 45. » Each phase encounters errors. » After detecting the errors, this phase must deal with the errors to continue with the process of compilation. 45 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  46. 46. » 1. Lexical analyzer: Misspelled tokens » 2.Syntax analyzer: syntax errors like missing parenthesis » 3.Intermediate code generator: Incompatible operands for an operator » 4. Code Optimizer: Unreachable statements » 5. Code Generator :Memory restrictions to store a variable. For example, when the value of an integer variable exceeds its size. » Symbol tables: Multiply declared identifiers 46 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  47. 47. » Show the output of all the phases of he compiler for the following line o code » A[index]=4+2+index 47 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  48. 48. 48 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  49. 49. 49 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  50. 50. » Scanner generators » Parser generators » Syntax-directed translation engines » Automatic code generators » Data-flow engines 50 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  51. 51. » Scanner generators: » generate lexical analyzers automatically from the language specifications written using regular expressions. » It generates a finite automaton to recognize the regular expression. » Example-lex 51 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  52. 52. » parser generators » They produce syntax analyzers from Context Free Grammar(CFG). » As syntax analysis phase is highly complex and consumes manual and compilation time, these parser generators are highly useful. » Example-yacc 52 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  53. 53. » Syntax-directed translation engines » These engines have routines to traverse the parse tree and produce intermediate code. » The basic idea is that one or more translations are associated with each node of parse tree. 53 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  54. 54. » Automatic code generators » These tools convert the intermediate language into machine language for the target machine using a collection of rules. » Template matching process is used. » An intermediate language statement is replaced by its equivalent machine language statement 54 Jeena Thomas, Asst Professor, CSE, SJCET Palai
  55. 55. » Data-flow engines » It is used in code optimization. » These tools perform good code optimization using “data-flow analysis” which gathers information that flows from one part of the program to another. 55 Jeena Thomas, Asst Professor, CSE, SJCET Palai

×