Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Dineshmaterial1 091225091539-phpapp02


Published on

adc material

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Dineshmaterial1 091225091539-phpapp02

  1. 1. 1.Explain the different phases of a Compiler.Compiler:Compiler is a program which translates a program written in one language (Source language) to anequivalent program in other language (the target language).Source program-CompilerTarget programCompiler is a Software for translating high level language (HLL) to machine level language.It is nothing but a translator and it should know both the high level language and the architecture ofthe computer.Most of the compilers are machine dependant but some compilers are machine independent. Eg:java.Turbo C, common for both C & C++Need for a Compiler:We need Compilers, because the source program does not understand by the Computer. So, it hasto convert into machine understandable language. So we use Compilers for this purpose.We cannot use the same compiler for all computers. Because every HLL has its own syntaxes.Phases of a Compiler:A Compiler takes as input a source program and produces as output an equivalent Sequence ofmachine instructions.This process is so complex that it is divided into a series of sub process called Phases.The different phases of a compiler are as followsAnalysis Phases : 1. Lexical Analysis2. Syntax Analysis3. Semantic AnalysisSynthesis Phases: 4.Intermediate Code generator5. Code Optimization6. Code generation.1. Lexical Analysis:It is the first phase of a Compiler. Lexical analyzer or Scanner reads the characters in the sourceprogram and groups them into a stream of tokens.
  2. 2. The usual tokens are identifiers, keywords, Constants, Operators and Punctuation Symbols such asComma and Parenthesis.Each token is a Sub-String of the source program that is to be treated as a single unit. Tokens are oftwo types:1. Specific Strings Eg: If, Semicolon2. Classes of Strings Eg: identifier, Constants, Label.A token is treated as a pair consisting of two parts.1. Token type2. Token Value.The character sequence forming a token is called the lexeme for the token.Certain tokens will be increased by a lexical value. The lexical analyser not only generates a token,but also it enters the lexeme into the symbol table.Symbol table1. a2. b3. cToken values are represented by pairs in square brackets. The second component of the pair is anindex to the symbol table where the information’s are kept.For eg., Consider the expressiona = b + c * 20After lexical Analysis it will be.id1 = id2 + id3 *20The lexical phase can detect errors where the characters remaining in the input do not form anytoken of the language. Eg: Unrecognized Keyword.2. Syntax Analysis:It groups tokens together into Syntactic Structures called an Expression.Expressions might further be combined to form statements.Often the syntactic structure can be regarded as a tree where leaves are tokens, called as parsetrees.The parser has two functions. It checks if the tokens, occur in pattern that are permitted by thespecification of the source language. Ie., Syntax checking.
  3. 3. For eg., Consider the expire the each position A+/B. After lexical Analysis this will be, as the tokensequence id+/id in the Syntax analyzer.On seeing / the syntax analyzer should detect an error. Because the presence of two adjacent binaryoperators violates the formulation rules.The second aspect is to make explicit the hierarchical Structure of incoming token stream byidentifying which parts of the token stream should be grouped.The Syntax analysis can detect syntax errors. Eg., Syntax error.3.Semantic Analysis:An important role of semantic analysis is type checking.Here the computer checks that the each operator has operands that are permitted by the sourcelanguage specification.Consider the eg: x= a+bDiagramThe language specification may permit some operand coercions. For eg: When a binary arithmeticoperator is applied to an integer and real. In this case, the compiler array need to convert the integerto a real.In this phase, the compiler detects type mismatch error.4. Intermediate Code generation:It uses the structure produced by the syntax analyzer to create a stream of simple instructions.Many styles are possible. One common style uses instruction with one operator and a small numberof operands.The output of the previous phase is some representation of a parse tree. This phase transforms thisparse tree into an intermediate language.One popular type of intermediate language is called Three address code.A typical three- address code statement is A = B op C.Where A, B, C are operands. OP is a binary Operator.Eg: A = B + c * 20Here, T1, T2, T3 are temporary variables. Id1, id2, id3 are the identifiers corresponding to A, B, C.5. Code Optimization:
  4. 4. It is designed to improve the intermediate code. So that the Object program less space.Optimization may involve:1. Detection & removal of dead code.2. Calculation of constant expressions & terms.3. Collapsing of repeated expressions into temporary storage.4. Loop unrolling.5. Moving code outside the loops.6. Removal of unnecessary temporary-variables.For e.g.: A: = B+ C * 20 will beT1 = id3 * 20.0Id1 = id2 + T16. Code generation:Once optimizations are completed, the intermediate code is mapped into the target languages. Thisinvolves,Allocation of registers & memoryGeneration of connect references.Generation of correct types.Generation of machine code.Eg: MOVF id3, R2MULF # 20.0, R2MOVF id2, R1ADDF R2, R1MOVF R1, id1.
  5. 5. 2.Compiler Construction Tools:A number of tools have been developed for helping implement various phases of a compiler. Someuseful compiler construction tools are a follow,1.Parser generators.2.Scanner generators.3.Syntax Directed Translation Engines.4.Automatic Code generators.5.Data flow engines.1.Parser Generators:These produce syntax analyzers, normally from input that is based on a context-free grammar.In early compilers, Syntax analysis not only consumed a large function of the running time of acompiler, but a large fraction of the interrectual effort of writing a compiler.This phase is very easy to implement. Many parser generations utilize powerful parsing algorithmsthat are too complex to be carried.2.Scanner Generators:These automatically generate lexical analyzers, normally from a specification based on regularexpressions.The basic organization of the resulting lexical analyzer is in effect a finite automation.3.Syntax – Directed Translation Engines:These produce collection of routines that walk the parse tree such as intermediate code.The basic idea is that one or more translations are associated with each node of the parse tree.Each translation is defined in terms of translations at its neighbour nodes in the tree.4.Automatic Code Generators:This tool takes a collection of rules that define the translation of each operation of the intermediatelanguage into the machine language for the target machine.The rules must include sufficient detail that we can handle the different possible access methods fordata, Eg. Variables may be in registers, in a fixed location in memory or may be allocated a positionon a stack.The basic technique is “template matching”. The intermediate code statements are replaced bytemplates.
  6. 6. That templates represent sequences of machine instructions.The assumptions about storage of variables match from template to template.5.Data Flow Engines:Much of the information needed to perform good code optimization involves “data flow analysis”.The gathering of information about how values are transmitted from one part of a program to eachother part.3.Issues in the design of a Code Generator:Since the code generation phase is system dependent, the following issues arises during the codegeneration.1. Input to the code generator.2. Target Program.3. Memory Management.4. Instruction Selection.5. Register Allocation.6. Evaluation Order.1.Input to the code generator:It is an intermediate code that may be of several forms.A) Linear representation – Postfix notationB) Three-address representation-quadruplesC) Virtual machine representation-stack machine code.D) Graphical representation-syntax tree, dags.The intermediate language can be represented by quantities that the target machine can directlymanipulate.By inserting type – conversion operations, the type – checking has to be taken place and thesemantic errors have to be detected already.Thus the input to the code generator. Must be free of errors.2.Target Programs:The output of the code generator is the target program that may be of several forms.a) Absolute machine language.b) Relocatable machine languagec) Assembly language.
  7. 7. Absolute machine language can placed in a fixed memory location and executed immediately.Example Compilers that produce absolute code are WATFIV & PL/ C.Producing a relocatable machine language program allows subprograms to be compiledseparately.A set of relocatable object modules can be linked together and loaded for execution by a linkingloader. This leads to an added expense.Producing an assembly- language program as output makes the code generation process easier.But it has to be assembled after code generation.3.Memory Management:Names in the source program is mapped to its address in runtime memory is done by the frontend & the code generator.The details about the name is available in the symbol table with the information. Such as itstype, width, amount of storage needed etc.,From the symbol table, a relative address can be determined for the name in a data area for theprocedure.If machine code is being generated, labels in three- address statements have to be converted toaddress of instructions. This process is parallel to the ‘back patching’ techniques.Eg: When we encounterJ : goto i generate the jump instruction as follows:i) If I<j, (i.e) backward jump, generate a jump instruction with the target address =machine location of the first instruction in the code for quadruple i.ii) If i>j (i.e) forward jump, We must store the location of that 1stinstruction for quadruple jon quadruple i’s list.When we process quadruple i, all the instructions that refers memory location of i arefilled.4.Instruction Section:The uniformity and completeness of the instruction set are important factor. Otherwise somespecial exception handling is needed.Instruction speed and memory idioms are also important factors.A sample target code sequence for the three-address statementX: = Y + Z can be
  8. 8. MOV Y, Ro // load Y into register RoADD Z, Ro // add Z to RoMOV Ro, X // store Ro into XBut, this kind of statement – by – statement code generation often produces poor code. For Eg, thesequence of statements.a: = b + cd: = a + eStatementa= a + 1MOV a, RoADD #1, RoMOV Ro, aHere INC takes lesser time as compared to the other set of instruction.5.Register Allocation:Instructions involving registers are usually faster than involving operands in memory.Store long life time values that are often used in the registers.Contain machine requires even-odd register pairs for some operand and results.For eg: in the IBM/ 370, the instruction division.D X, YIn which X- divided even register in even / oddY- divisorEven register - remainderOdd register – quotient6.Evaluation Order:The order in which computations are performed (ie., instructions execution) can affect the efficiencyof the target code.But picking up a best order is a difficult one.
  9. 9. Initially, We shall avoid the problem by generating code for the three- address statements in theorder in which they have been produced by the intermediate code generator.4.Discuss about parameter passing machanism:Parameters used to provide the communication between the caller and callee.There are four methods for associating actual and formal parameters. They are,1. Call – by – Value2. Call –by – reference3. Copy – by – Restore4. Call – by – Name1.Call – by – Value:Call – by – value is the simplest method of passing parameters.The actual parameters are evaluated and their r – values are passed to the called procedure.This method is used in pascal and C.l- value: It refers to the storage represented by an expression.r-value: It refers to the value contained in the storage.Call – by – value can be implemented as follows:(i)A formal parameter to treated just like a local name, So the storage for the formals is in theactivation record of the called procedure.(ii)The caller evaluates the actual parameters and places their r-values in the storage for the formals.2.Call – by – reference:This method is otherwise called as call-by-address or call-by-location.The caller passes a pointer to each location of actual parameters.It an actual parameter is a name or an expression having an l-value then that l-value itself is passed.However, if the actual parameter is an expression like a +b or 5, that has no l-value, than theexpression is evaluated in a new location and the address of that location is passed.3.Copy – Restore:This method is a hybrid between Call – by – value and Call – by – reference. This is also known ascopy – in – copy – out or value reset.This Calling procedure calculates the value of the actual parameter and it is then copied to activationrecord for the called procedure.The l – values of these actual parameters having l-values are determined before the call.
  10. 10. When control returns, the current r- values of the formal parameters are copied back into the l-values of the actual, using the i-values computed before the call.4. Call – by – Name:This procedure is treated like a macro, that is, its body is substituted for the call in the caller, withthe actual parameters literally substituted for the formals.Such a literal substitution is called macro- expansion or in- line expansion.The local names of the called procedure are kept distinct from the names of the calling procedure.The actual parameters are surrounded by parenthesis if necessary to presence their integrity.6.Storage allocation strategies:A different storage – allocation strategy is used in run – time memory organization.They are,1.Static allocation: Lays out storage at compile time for all data objects.2.Stack allocation: Manages the run time storage.3.Heap allocation: Allocates and de-allocates storage as needed at run time from heap.These allocation strategies are applied to allocate memory for activation records. Differentlanguages use different strategies for this purpose.For eg: FORTRANStatic allocationAlgol Stack allocationLISP Heap allocation1.Static allocation:The fundamental characteristics of static allocation are as follows:(i)Name binding occurs during compilation there is no need for a run-time support package.(ii)Bindings do not change at run time.(iii)On every invocation of procedure, its names are bound to the same storage locations. Whencontrol returns to a procedure, the values of the locals are the same as they were when control leftthe last time.(ie., this property allows the values of local names to be retained across activations of aprocedure).Eg: Function F()
  11. 11. {int a;Print (a);a=10;}After calling F() once, if it was called a second time, the value of ‘a’ would initially be 10, and this willbe printed.(iv)The type of a name determines its storage requirement.The address for this storage is an offset from the procedures activation record and the compilermust decide where the activation records go, relative to the target code and to one another.After this position has been decided, the address of the activation records and hence of thestorage for each name in the records are fixed.Thus at compile time, the addresses at which the target code can find the data it operates uponcan be filled in. The addresses at which information is to be saved when a procedure call takesplaces are also known at compile time.Static allocation have some limitations are:(i)Size of data objects, as well as any constraints on their positions in memory, must be available atcompile time.(ii)No recursion, because all activations of a given procedure use the same bindings for local names.(iii)No dynamic data structures, since no mechanism is provided for run-time storage allocation.2.Stack allocation:It is based on the idea of a control stack. Storage is organized as stack, and activation records arepushed and popped as activations begin and end respectively.Storage for the locals in each call of a procedure is contained in the activation record for that call.Thus locals are bounds to fresh storage in each activation, because a new activation record is pushedon to the stack when a call is made.The values of locals are deleted when the activation ends. ie., the values are lost because the storagefor locals disappears when the activation record is popped.Eg: Activation tree,DIAGRAMCalling Sequences:
  12. 12. A call sequence allocates an activation records and enters information into its fields.A return sequences and activation records differ, even for the same language.The code in the calling sequence is often divided between the calling procedure and the procedure itcalls.There is no exact division of run time tasks between the caller and the callee.DIAGRAMThe register stack top points to the end of the machine status field in the activation records.This position is known to the Caller, So it can be made responsible for setting up stack top beforecontrol flows to the called procedure.This code for the callee can access its temporaries and the local data using offsets from stack top.The call sequence is:Caller: Evaluates actualStores return address & old values of top-SPIncrements top-SPCallee: Saves register values & other status information.Initializes local data & begins execution.The return sequence isCallee: Places return value next to callers Activation Record.Restores top-SP & other registers.Branches to return address.Caller: Copies returned value into its own Activation Record.Limitations of Stack allocation:Values of locals cannot be retained when activation ends.A called activation cannot outlive the caller.3. Heap Allocation:Limitations of stack allocation are mentioned already, in those cases de-allocation of Activationrecords cannot occur in last in first out fashion.Heap gives out pieces of contiguous storage for activation records.
  13. 13. Pieces may be de-allocated in any order over time the heap will consist of alternate areas that arefree and in use.Heap manager is supposed to make use of the free space.For efficiency reasons it may be helpful to handle small activations as a special case.For each size of interest keep a linked list of tree blocks of that size.Fill a request of size S with block of size S where S is the smallest size greater than or equal to S.DIAGRAMFor large blocks of storage use heap manager.For large amount of storage computation may take some time to use up the memory So that timetaken by the manager be negligible compared to the computation.Heap manage will dynamically allocate memory. This will come a run time overhead. As heapmanager will have to take care of defragmentation and garbage collection.