Language TranslatorsLanguage translators convert programming source code into language that the computerprocessor understands. Programming source code has various structures and commands, butcomputer processors only understand machine language. Different types of translations mustoccur to turn programming source code into machine language, which is made up of bits ofbinary data. The three major types of language translators are compilers, assemblers, andinterpreters. 1. CompilersMost 3GL and higher-level programming languages use a compiler for language translation. Acompiler is a special program that takes written source code and turns it into machine language.When a compiler executes, it analyzes all of the language statements in the source code andbuilds the machine language object code. After a program is compiled, it is then a form that theprocessor can execute one instruction at a time.In some operating systems, an additional step called linking is required after compilation.Linking resolves the relative location of instructions and data when more than one object moduleneeds to be run at the same time and both modules cross-reference each otherüs instructionsequences or data.Most high-level programming languages come with a compiler. However, object code is uniquefor each type of computer. Many different compilers exist for each language in order to translatefor each type of computer. In addition, the compiler industry is quite competitive, so there areactually many compilers for each language on each type of computer. Although they require anextra step before execution, compiled programs often run faster than programs executed using aninterpreter.A compiler is a computer program (or set of programs) that transforms source code written in acomputer language (the source language) into another computer language (the target language,often having a binary form known as object code). The most common reason for wanting totransform source code is to create an executable program.The name "compiler" is primarily used for programs that translate source code from a high-levelprogramming language to a lower level language (e.g., assembly language or machine code). Aprogram that translates from a low level language to a higher level one is a decompiler. Aprogram that translates between high-level languages is usually called a language translator,source to source translator, or language converter. A language rewriter is usually a programthat translates the form of expressions without a change of language.A compiler is likely to perform many or all of the following operations: lexical analysis,preprocessing, parsing, semantic analysis, code generation, and code optimization.
a) NATIVE AND CROSS COMPILERSA native or hosted compiler is one whose output is intended to directly run on the same type ofcomputer and operating system that the compiler itself runs on. The output of a cross compiler isdesigned to run on a different platform. Cross compilers are often used when developingsoftware for embedded systems that are not intended to support a software developmentenvironment.The output of a compiler that produces code for a virtual machine (VM) may or may not beexecuted on the same platform as the compiler that produced it. For this reason such compilersare not usually classified as native or cross compilers. b) ONE PASS AND MULTI PASS COMPILERSClassifying compilers by number of passes has its background in the hardware resourcelimitations of computers. Compiling involves performing lots of work and early computers didnot have enough memory to contain one program that did all of this work. So compilers weresplit up into smaller programs which each made a pass over the source (or some representation ofit) performing some of the required analysis and translations.The ability to compile in a single pass is often seen as a benefit because it simplifies the job ofwriting a compiler and one pass compilers generally compile faster than multi-pass compilers.Many languages were designed so that they could be compiled in a single pass (e.g., Pascal).The front end analyzes the source code to build an internal representation of the program, calledthe intermediate representation or IR. It also manages the symbol table, a data structure mappingeach symbol in the source code to associated information such as location, type and scope. Thisis done over several phases, which includes some of the following: 1. Line reconstruction. Languages which strop their keywords or allow arbitrary spaces within identifiers require a phase before parsing, which converts the input character sequence to a canonical form ready for the parser. The top-down, recursive-descent, table-driven parsers used in the 1960s typically read the source one character at a time and did not require a separate tokenizing phase. Atlas Autocode, and Imp (and some implementations of Algol and Coral66) are examples of stropped languages whose compilers would have a Line Reconstruction phase. 2. Lexical analysis breaks the source code text into small pieces called tokens. Each token is a single atomic unit of the language, for instance a keyword, identifier or symbol name. The token syntax is typically a regular language, so a finite state automaton constructed from a regular expression can be used to recognize it. This phase is also called lexing or scanning, and the software doing lexical analysis is called a lexical analyzer or scanner. 3. Preprocessing. Some languages, e.g., C, require a preprocessing phase which supports macro substitution and conditional compilation. Typically the preprocessing phase occurs before syntactic or semantic analysis; e.g. in the case of C, the preprocessor manipulates lexical tokens rather than syntactic forms. However, some languages such as Scheme support macro substitutions based on syntactic forms.
4. Syntax analysis involves parsing the token sequence to identify the syntactic structure of the program. This phase typically builds a parse tree, which replaces the linear sequence of tokens with a tree structure built according to the rules of a formal grammar which define the languages syntax. The parse tree is often analyzed, augmented, and transformed by later phases in the compiler. 5. Semantic analysis is the phase in which the compiler adds semantic information to the parse tree and builds the symbol table. This phase performs semantic checks such as type checking (checking for type errors), or object binding (associating variable and function references with their definitions), or definite assignment (requiring all local variables to be initialized before use), rejecting incorrect programs or issuing warnings. Semantic analysis usually requires a complete parse tree, meaning that this phase logically follows the parsing phase, and logically precedes the code generation phase, though it is often possible to fold multiple phases into one pass over the code in a compiler implementation. 2. AssemblerAn assembler translates assembly language into machine language. Assembly language is onestep removed from machine language. It uses computer-specific commands and structure similarto machine language, but assembly language uses names instead of numbers.An assembler is similar to a compiler, but it is specific to translating programs written inassembly language into machine language. To do this, the assembler takes basic computerinstructions from assembly language and converts them into a pattern of bits for the computerprocessor to use to perform its operations.Typically a modern assembler creates object code by translating assembly instructionmnemonics into opcodes, and by resolving symbolic names for memory locations and otherentities. The use of symbolic references is a key feature of assemblers, saving tediouscalculations and manual address updates after program modifications. Most assemblers alsoinclude macro facilities for performing textual substitution—e.g., to generate common shortsequences of instructions to run inline, instead of in a subroutine.Assemblers are generally simpler to write than compilers for high-level languages, and havebeen available since the 1950s. Modern assemblers, especially for RISC based architectures,such as MIPS, Sun SPARC, and HP PA-RISC, as well as x86(-64), optimize instructionscheduling to exploit the CPU pipeline efficiently.There are two types of assemblers based on how many passes through the source are needed toproduce the executable program. One-pass assemblers go through the source code once andassumes that all symbols will be defined before any instruction that references them. Two-passassemblers (and multi-pass assemblers) create a table with all unresolved symbols in the firstpass, then use the 2nd pass to resolve these addresses. The advantage in one-pass assemblers isspeed - which is not as important as it once was with advances in computer speed and
capabilities. The advantage of the two-pass assembler is that symbols can be defined anywherein the program source. As a result, the program can be defined in a more logical and meaningfulway. This makes two-pass assembler programs easier to read and maintain.More sophisticated high-level assemblers provide language abstractions such as: Advanced control structures High-level procedure/function declarations and invocations High-level abstract data types, including structures/records, unions, classes, and sets Sophisticated macro processing Object-Oriented features such as encapsulation, polymorphism, inheritance, interfaces 3. InterpretersMany high-level programming languages have the option of using an interpreter instead of acompiler. Some of these languages exclusively use an interpreter. An interpreter behaves verydifferently from compilers and assemblers. It converts programs into machine-executable formeach time they are executed. It analyzes and executes each line of source code, in order, withoutlooking at the entire program. Instead of requiring a step before program execution, aninterpreter processes the program as it is being executed.In computer science, an interpreter is a computer program which reads source code written in ahigh-level programming language, transforms the code to machine code, and executes themachine code. Using an interpreter, a single source file can produce equal results even in vastlydifferent systems (e.g. a PC and a PlayStation3). Using a compiler, a single source file canproduce equal results only if it is compiled to distinct, system-specific executables.Interpreting code is slower than running the compiled code because the interpreter must analyzeeach statement in the program each time it is executed and then perform the desired action,whereas the compiled code just performs the action within a fixed context determined by thecompilation. This run-time analysis is known as "interpretive overhead". Access to variables isalso slower in an interpreter because the mapping of identifiers to storage locations must be donerepeatedly at run-time rather than at compile time. There are various compromises between thedevelopment speed when using an interpreter and the execution speed when using a compiler.Some systems (e.g., some LISPs) allow interpreted and compiled code to call each other and toshare variables. This means that once a routine has been tested and debugged under theinterpreter it can be compiled and thus benefit from faster execution while other routines arebeing developed. Many interpreters do not execute the source code as it stands but convert it intosome more compact internal form. For example, some BASIC interpreters replace keywords withsingle byte tokens which can be used to find the instruction in a jump table. An interpreter mightwell use the same lexical analyzer and parser as the compiler and then interpret the resultingabstract syntax tree.
A compiler takes a text file written in a programming language, and converts it into binary codethat a processor can understand: it makes an ".exe" file. You compile only once, then always runthe "exe" file. Borland Turbo C is a compiler: you write in C in a text file, then you compile toget and exe file.An interpreter does the same, BUT in real time: each time you run the code, it is "compiled", lineby line: Basic is an interpreter.An assembler is similar, in the way that, instead of taking a plain text file, ie in C, it takes a codewritten in Assembler Mnemonics, and convert it into binaries.All "executable" files are in binaries (just 1s and 0s) - maybe viewed in hex (0x12de...)In a nutshell: A compiler takes your source programming code and converts it into an executableform that the computer can understand. This is a very broad explanation though, because somecompilers only go so far as to convert it into a binary file that must then be "linked" with severalother libraries of code before it can actually execute. Other compilers can compile straight toexecutable code. Still other compilers convert it to a sort of tokenized code that still needs to besemi-interpreted by a virtual machine, such as Java.An interpreter does not compile code. Instead, it typically reads a source code file statement bystatement and then executes it. Most early forms of BASIC were interpeted languages.An assembler is similar to a compiler, except that it takes source code written in "AssemblyLanguage", which is just shorthand for the actual machine/processor specific instructions, values,and memory locations, and it converts those instructions to the equivalent machine language.Very fast and small executable code but very tedious to write.Incidentally, many compilers, especially older C compilers, for example, actually convert the Csource code to assembly language and then pass it through an assembler. The benefit is thatsomeone adept at assembly can tweak the compiler-generatd assembler code for speed or size.