INTRODUCTION
TO COMPILER
DEVELOPMENT
Compiler Construction
PRESENTED BY
PRESENTED BY
PRESENTED BY
D E E P
C S - E 1 9 - 0 1 5
WHAT IS COMPILER
A compiler is a special program that processes statements written in a
particular programming language and turns them into machine language or
"code" that a computer's processor uses. Typically, a programmer writes
language statements in a language such as C++ or C one line at a time using
an editor. The file that is created contains what are called the source
statements. The programmer then runs the appropriate language compiler,
specifying the name of the file that contains the source statements.
Towards the end of the 1950s, machine-independent programming
languages were first proposed. Subsequently, several experimental
compilers were developed. The first compiler was written by Grace
Hopper, in 1952, for the A-0 programming language. The FORTRAN team
led by John Backus at IBM is generally credited as having introduced the
first complete compiler in 1957. COBOL was an early language to be
compiled on multiple architectures, in 1960
HISTORY OF COMPILER
WHY DO WE NEED A
COMPILER?
A Computer understands only binary language and executes instructions
coded in binary language. It cannot execute a single instruction given in any
other form. Therefore, we must provide instructions to the computer in binary
language. Means we must write computer programs entirely in binary
language (sequence of 0s and 1s). So, there was a need of a translator that
translates the computer instructions given in English language to binary
language.
Computers do not understand human languages. In fact, at the lowest
level, computers only understand sequences of numbers that represent
operational codes (op codes for short). On the other hand, it would be very
difficult for humans to write programs in terms of op codes. Therefore,
programming languages were invented to make it easier for humans to
write computer programs.
Programming languages are for humans to read and understand. The
program (source code) must be translated into machine language so that
the computer can execute the program (as the computer only
understands machine language). The way that this translation occurs
depends on whether the programming language is a compiled language
or an interpreted language.
HOW COMPILER WORKS
WORK OF PHASES
The compilation process is a sequence of
various phases. Each phase takes input
from its previous stage, has its own
representation of source program, and
feeds its output to the next phase of the
compiler. Let us understand the phases
of a compiler.
HOW COMPILER WORK
THROUGH ITS PHASES
LEXICAL ANALYSIS


The first phase of scanner works as a text scanner. This phase scans the
source code as a stream of characters and converts it into meaningful
lexemes. Lexical analyzer represents these lexemes in the form of tokens
as:
<token-name, attribute-value>
#include <iostream>
int main() {
cout << "Hello World!";
return 0;
}
HELLO WORLD PROGRAM
SYNTAX ANALYSIS
The next phase is called the syntax analysis or
parsing. It takes the token produced by lexical
analysis as input and generates a parse tree (or
syntax tree). In this phase, token arrangements are
checked against the source code grammar, i.e. the
parser checks if the expression made by the
tokens is syntactically correct.
SEMANTIC
ANALYSIS Semantic analysis checks whether the parse tree
constructed follows the rules of language. For
example, assignment of values is between
compatible data types, and adding string to an
integer. Also, the semantic analyzer keeps track of
identifiers, their types and expressions; whether
identifiers are declared before use or not etc. The
semantic analyzer produces an annotated syntax
tree as an output.
After semantic analysis, the compiler generates an
intermediate code of the source code for the target machine.
It represents a program for some abstract machine. It is in
between the high-level language and the machine language.
This intermediate code should be generated in such a way
that it makes it easier to be translated into the target machine
code.
INTERMEDIATE CODE GENERATION
CODE OPTIMIZATION


THE NEXT PHASE DOES CODE OPTIMIZATION OF THE
INTERMEDIATE CODE. OPTIMIZATION CAN BE ASSUMED
AS SOMETHING THAT REMOVES UNNECESSARY CODE
LINES, AND ARRANGES THE SEQUENCE OF STATEMENTS
IN ORDER TO SPEED UP THE PROGRAM EXECUTION
WITHOUT WASTING RESOURCES (CPU, MEMORY).
CODE GENERATION


IN THIS PHASE, THE CODE GENERATOR TAKES THE
OPTIMIZED REPRESENTATION OF THE INTERMEDIATE
CODE AND MAPS IT TO THE TARGET MACHINE
LANGUAGE. THE CODE GENERATOR TRANSLATES THE
INTERMEDIATE CODE INTO A SEQUENCE OF (GENERALLY)
RE-LOCATABLE MACHINE CODE. SEQUENCE OF
INSTRUCTIONS OF MACHINE CODE PERFORMS THE TASK
AS THE INTERMEDIATE CODE WOULD DO.
SYMBOL TABLE


IT IS A DATA-STRUCTURE MAINTAINED THROUGHOUT ALL
THE PHASES OF A COMPILER. ALL THE IDENTIFIER'S
NAMES ALONG WITH THEIR TYPES ARE STORED HERE.
THE SYMBOL TABLE MAKES IT EASIER FOR THE
COMPILER TO QUICKLY SEARCH THE IDENTIFIER RECORD
AND RETRIEVE IT. THE SYMBOL TABLE IS ALSO USED
FOR SCOPE MANAGEMENT.
Thank you
Q&A YOU MAY ASK

Introduction to compiler development

  • 1.
    INTRODUCTION TO COMPILER DEVELOPMENT Compiler Construction PRESENTEDBY PRESENTED BY PRESENTED BY D E E P C S - E 1 9 - 0 1 5
  • 2.
    WHAT IS COMPILER Acompiler is a special program that processes statements written in a particular programming language and turns them into machine language or "code" that a computer's processor uses. Typically, a programmer writes language statements in a language such as C++ or C one line at a time using an editor. The file that is created contains what are called the source statements. The programmer then runs the appropriate language compiler, specifying the name of the file that contains the source statements.
  • 3.
    Towards the endof the 1950s, machine-independent programming languages were first proposed. Subsequently, several experimental compilers were developed. The first compiler was written by Grace Hopper, in 1952, for the A-0 programming language. The FORTRAN team led by John Backus at IBM is generally credited as having introduced the first complete compiler in 1957. COBOL was an early language to be compiled on multiple architectures, in 1960 HISTORY OF COMPILER
  • 4.
    WHY DO WENEED A COMPILER? A Computer understands only binary language and executes instructions coded in binary language. It cannot execute a single instruction given in any other form. Therefore, we must provide instructions to the computer in binary language. Means we must write computer programs entirely in binary language (sequence of 0s and 1s). So, there was a need of a translator that translates the computer instructions given in English language to binary language.
  • 5.
    Computers do notunderstand human languages. In fact, at the lowest level, computers only understand sequences of numbers that represent operational codes (op codes for short). On the other hand, it would be very difficult for humans to write programs in terms of op codes. Therefore, programming languages were invented to make it easier for humans to write computer programs. Programming languages are for humans to read and understand. The program (source code) must be translated into machine language so that the computer can execute the program (as the computer only understands machine language). The way that this translation occurs depends on whether the programming language is a compiled language or an interpreted language. HOW COMPILER WORKS
  • 6.
    WORK OF PHASES Thecompilation process is a sequence of various phases. Each phase takes input from its previous stage, has its own representation of source program, and feeds its output to the next phase of the compiler. Let us understand the phases of a compiler. HOW COMPILER WORK THROUGH ITS PHASES
  • 7.
    LEXICAL ANALYSIS The firstphase of scanner works as a text scanner. This phase scans the source code as a stream of characters and converts it into meaningful lexemes. Lexical analyzer represents these lexemes in the form of tokens as: <token-name, attribute-value>
  • 8.
    #include <iostream> int main(){ cout << "Hello World!"; return 0; } HELLO WORLD PROGRAM
  • 9.
    SYNTAX ANALYSIS The nextphase is called the syntax analysis or parsing. It takes the token produced by lexical analysis as input and generates a parse tree (or syntax tree). In this phase, token arrangements are checked against the source code grammar, i.e. the parser checks if the expression made by the tokens is syntactically correct.
  • 10.
    SEMANTIC ANALYSIS Semantic analysischecks whether the parse tree constructed follows the rules of language. For example, assignment of values is between compatible data types, and adding string to an integer. Also, the semantic analyzer keeps track of identifiers, their types and expressions; whether identifiers are declared before use or not etc. The semantic analyzer produces an annotated syntax tree as an output.
  • 11.
    After semantic analysis,the compiler generates an intermediate code of the source code for the target machine. It represents a program for some abstract machine. It is in between the high-level language and the machine language. This intermediate code should be generated in such a way that it makes it easier to be translated into the target machine code. INTERMEDIATE CODE GENERATION
  • 12.
    CODE OPTIMIZATION THE NEXTPHASE DOES CODE OPTIMIZATION OF THE INTERMEDIATE CODE. OPTIMIZATION CAN BE ASSUMED AS SOMETHING THAT REMOVES UNNECESSARY CODE LINES, AND ARRANGES THE SEQUENCE OF STATEMENTS IN ORDER TO SPEED UP THE PROGRAM EXECUTION WITHOUT WASTING RESOURCES (CPU, MEMORY).
  • 13.
    CODE GENERATION IN THISPHASE, THE CODE GENERATOR TAKES THE OPTIMIZED REPRESENTATION OF THE INTERMEDIATE CODE AND MAPS IT TO THE TARGET MACHINE LANGUAGE. THE CODE GENERATOR TRANSLATES THE INTERMEDIATE CODE INTO A SEQUENCE OF (GENERALLY) RE-LOCATABLE MACHINE CODE. SEQUENCE OF INSTRUCTIONS OF MACHINE CODE PERFORMS THE TASK AS THE INTERMEDIATE CODE WOULD DO.
  • 14.
    SYMBOL TABLE IT ISA DATA-STRUCTURE MAINTAINED THROUGHOUT ALL THE PHASES OF A COMPILER. ALL THE IDENTIFIER'S NAMES ALONG WITH THEIR TYPES ARE STORED HERE. THE SYMBOL TABLE MAKES IT EASIER FOR THE COMPILER TO QUICKLY SEARCH THE IDENTIFIER RECORD AND RETRIEVE IT. THE SYMBOL TABLE IS ALSO USED FOR SCOPE MANAGEMENT.
  • 15.