• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content







Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    3.2 3.2 Document Transcript

    • Chapter 3.2 The Functions and Purposes of Translators3.2 (a) Interpreters and CompilersWhen electronic computers were first used, the programs had to be written in machinecode. This code was comprised of simple instructions each of which was representedby a binary pattern in the computer. To produce these programs, programmers had towrite the instructions in binary. This not only took a long time, it was also prone toerrors. To improve program writing assembly languages were developed. Assemblylanguages allowed the use of mnemonics and names for locations in memory. Eachassembly instruction mapped to a single machine instruction which meant that it wasfairly easy to translate a program written in assembly language to machine code. Tospeed up this translation process, assemblers were written which could be loaded intothe computer and then the computer could translate the assembly language to machinecode. Writing programs in assembly language, although easier than using machinecode, was still tedious and took a long time.After assembly languages, came high-level languages which used the type oflanguage used by the person writing the program. Thus FORTRAN (FORmulaTRANslation) was developed for science and engineering programs and it usedformulae in the same way as would scientists and engineers. COBOL (CommonBusiness Oriented Language) was developed for business applications. Programswritten in these languages needed to be translated into machine code. This led to thebirth of compilers.A compiler takes a program written in a high-level language and translates into anequivalent program in machine code. Once this is done, the machine code version canbe loaded into the machine and run without any further help as it is complete in itself.The high-level language version of the program is usually called the source code andthe resulting machine code program is called the object code. The relationshipbetween them is shown in Fig. 3.2.a.1. Source Code Object Code (High-Level COMPILER (Machine Language) Language) Fig. 3.2.a.1The problem with using a compiler is that it uses a lot of computer resources. It hasto be loaded in the computers memory at the same time as the source code and therehas to be sufficient memory to hold the object code. Further, there has to be sufficientmemory for working storage while the translation is taking place. Anotherdisadvantage is that when an error in a program occurs it is difficult to pin-point itssource in the original program.An alternative system is to use interpretation. In this system each instruction is takenin turn and translated to machine code. The instruction is then executed before thenext instruction is translated. This system was developed because early personal 4.2 - 1
    • computers lacked the power and memory needed for compilation. This method alsohas the advantage of producing error messages as soon as an error is encountered.This means that the instruction causing the problem can be easily identified. Againstinterpretation is the fact that execution of a program is slow compared to that of acompiled program. This is because the original program has to be translated everytime it is executed. Also, instructions inside a loop will have to be translated eachtime the loop is entered.However, interpretation is very useful during program development as errors can befound and corrected as soon as they are encountered. In fact many languages, such asVisual Basic, use both an interpreter and a compiler. This enables the programmer touse the interpreter during program development and, when the program is fullyworking, it can be translated by the compiler into machine code. This machine codeversion can then be distributed to users who do not have access to the original code.Whether a compiler or interpreter is used, the translation from a high-level languageto machine code has to go through various stages and these are shown in Fig. 3.2.a.2. Source Program Lexical Analysis Syntax Analysis Semantic Analysis Intermediate Language Code Generation Code Optimisation Object Program 4.2 - 2
    • 3.2 (b) Lexical AnalysisThe lexical analyser uses the source program as input and creates a stream of tokensfor the syntax analyser. Tokens are normally 16-bit unsigned integers. Each group ofcharacters is replaced by a token. Single characters, which have a meaning in theirown right, are replaced by their ASCII values. Multiple character tokens arerepresented by integers greater than 255 because the ones up to 255 are reserved forthe ASCII codes. Variable names will need to have extra information stored aboutthem. This is done by means of a symbol table. This table is used throughoutcompilation to build up information about names used in the program. During lexicalanalysis only the variables name will be noted. It is during syntax and semanticanalysis that details such as the variables type and scope are added. The lexicalanalyser may output some error messages and diagnostics. For example, it will reporterrors such as an identifier or variable name which breaks the rules of the language.At various stages during compilation it will be necessary to look up details about thenames in the symbol table. This must be done efficiently so a linear search is notsufficiently fast. In fact, it is usual to use a hash table and to calculate the position ofa name by hashing the name itself. When two names are hashed to the same address,a linked list can be used to avoid the symbol table filling up.The lexical analyser also removes redundant characters such as white space (these arespaces, tabs, etc., which we may find useful to make the code more readable, but thecomputer does not want) and comments. Often the lexical analysis takes longer thanthe other stages of compilation. This is because it has to handle the original sourcecode, which can have many formats. For example, the following two pieces of codeare equivalent although their format is considerably different.IF X = Y THEN square X IF X = Y THEN Z := X * X Z := X * X ELSE Z := Y * YELSE square Y PRINT Z Z := Y *YENDIFPRINT ZWhen the lexical analyser has completed its task, the code will be in a standard formatwhich means that the syntax analyser (which is the next stage, and we will be lookingat in 3.2.c) can always expect the format of its input to be the same. 4.2 - 3
    • 3.2 (c) Syntax AnalysisThis Section should be read in conjunction with Section 3.5.j which discusses Backus-Naur Form (BNF) and syntax diagrams.During this stage of compilation the code generated by the lexical analyser is parsed(broken into small units) to check that it is grammatically correct. All languages haverules of grammar and computer languages are no exception. The grammar ofprogramming languages is defined by means of BNF notation or syntax diagrams. Itis against these rules that the code has to be checked.For example, taking a very elementary language, an assignment statement may bedefined to be of the form <variable> <assignment_operator> <expression>and expression is <variable> <arithmetic_operator> <variable>and the parser must take the output from the lexical analyser and check that it is ofthis form.If the statement is sum := sum + numberthe parser will receive <variable> <assignment_operator> <variable> <arithmetic_operator> <variable>which becomes <variable> <assignment_operator> <expression>and then <assignment statement>which is valid.If the original statement is sum := sum + + numberthis will be input as <variable> <assignment_operator> <variable> <arithmetic_operator> <arithmetic_operator> <variable> 4.2 - 4
    • and this does not represent a valid statement hence an error message will be returned.It is at this stage that invalid names can be found such as PRIT instead of PRINT asPRIT will be read as a variable name instead of a reserved word. This will mean thatthe statement containing PRIT will not parse to a valid statement. Note that inlanguages that require variables to be declared before being used, the lexical analysermay pick up this error because PRIT has not been declared as a variable and so is notin the symbol table.Most compilers will report errors found during syntax analysis as soon as they arefound and will attempt to show where the error has occurred. However, they may notbe very accurate in their conclusions nor may the error message be very clear.During syntax analysis certain semantic checks are carried out. These include labelchecks, flow of control checks and declaration checks.Some languages allow GOTO statements (not recommended by the authors) whichallow control to be passed, unconditionally, to another statement which has a label.The GOTO statement specifies the label to which the control must pass. Thecompiler must check that such a label exists.Certain control constructs can only be placed in certain parts of a program. Forexample in C (and C++) the CONTINUE statement can only be placed inside a loopand the BREAK statement can only be placed inside a loop or SWITCH statement.The compiler must ensure that statements like these are used in the correct place.Many languages insist on the programmer declaring variables and their types. It is atthis stage that the compiler verifies that all variables have been properly declared andthat they are used correctly. 4.2 - 5
    • 3.2 (d) Code GenerationIt is at this stage, when all the errors due to incorrect use of the language have beenremoved, that the program is translated into code suitable for use by the computersprocessor.During lexical and syntax analysis a table of variables has been built up whichincludes details of the variable name, its type and the block in which it is valid. Theaddress of the variable is now calculated and stored in the symbol table. This is doneas soon as the variable is encountered during code generation.Before the final code can be generated, an intermediate code is produced. Thisintermediate code can then be interpreted or translated into machine code. In thelatter case, the code can be saved and distributed to computer systems as anexecutable program3.2 (e) Linkers and LoadersPrograms are usually built up in modules. These modules are then compiled intomachine code that has to be loaded into the computers memory. This process is doneby the loader. The loader decides where in memory to place the code and then adjustsmemory addresses as described in Chapter 3.1. As the whole program may consist ofmany modules, all of which have been separately compiled, the modules will have tobe correctly linked once they have been loaded. This is the job of the linker. Thelinker calculates the addresses of the separate pieces that make up the whole programand then links these pieces so that all the modules can interact with one another. 4.2 - 6