Overview of language processor course d&a

PREARED BY:
Zinal Gohil
ASST. PROF.
(CE/IT)
(PPSU,SOE)
2. OVERVIEW OF
LANGUAGE
PROCESSOR

 Programming Languages and Language
Processors,
 Language Processing Activities,
 Program Execution,
 Fundamental of Language Processing,
 Symbol Tables;
 Data Structures for Language Processing: Search
Data structures, Allocation Data Structures
OUTLINE

 Language processor comprises of compilers, assemblers
and interpreters
 Programmers write software’s in variety of languages
and compilers and interpreters translates it in to
instructions that are understood by the computer at
machine level.
 Language processing activities occur due to difference
between how software is made by the programmer and
how it is implemented by computer.
 The software programmers mentions two domains
 1. APPLICATION DOMAIN : To present the idea
 2. EXECUTION DOMAIN : for carrying of these ideas
OVERVIEW OF PROGRAMMING
LANGUAGE AND LANGUAGE
PROCESSOR

 Semantics pertains to the meaning of words. The
semantics of a language is a description of what the
sentences mean. It is much more difficult to express the
semantics of a language than it is to express the syntax.
 In order to implement a programming language we must
know what each sentence means (declaration,
expression, etc).
 E.g., does the sentence
 produce an output,
 take any inputs,
 change the value stored in a variable,
 produce an error.
TERMINOLOGIES
1. SEMANTICS (MEANING)

 Domain: It refers to the scope or sphere of any activity.
 Application Domain: The scope of an application is its
application domain.
 E.g., the application domain of an inventory program is
warehouse and its associated tangibles (goods,
machinery, etc), transactions (e.g., receiving goods,
purchase orders, locating goods, shipping of goods,
receiving payments, etc), people (e.g., workers,
managers, customers).
 All the above are objects in the application domain. The
application domain can best be described by a person in
that domain. E.g., the warehouse manager in the above
example.
TERMINOLOGIES
2. APPLICATION DOMAIN

 Execution Domain: (also called as the solution domain).
The execution domain is the work of programmers, e.g.,
program code, documentation, test results, files, computers,
etc.
 The solution domain is partitioned into two levels:
 Abstract, high-level documents, such as flow charts,
diagrams
 Low-level – data structures, function definitions, etc.
TERMINOLOGIES
3. EXECUTION DOMAIN

 The difference between the semantics of the application
domain and the execution domain is called the semantic
gap.
TERMINOLOGIES
4. SEMANTIC GAP

 Consequences of semantic gap:
 Large development times – interaction between
designers in application domain and programmers.
 Large development efforts.
 Poor quality of software.
CONSEQUENCES OF SEMANTIC GAP

 The semantic gap is reduced by programming languages
(PL). The use of a PL introduces a new domain called the
programming language domain (or PL domain).
 The PL domain bridges the gap between the application
domain and the execution domain.
HOW IS THE SEMANTIC GAP
REDUCED?

 Specification gap: It is the semantic gap between the
application domain and the PL domain.
 It can also be defined as the semantic gap between the
two specifications of the same task.
 The specification gap is bridged by the software
development team.
 Execution gap: It is the gap between the semantics of
programs written in different programming languages.
 The execution gap is bridged by the translator or
interpreter.
SPECIFICATION GAP AND EXECUTION
GAP
Advantages of introducing the PL domain:
(a) Large development times are reduced.
(b) Better quality of software.
(c) Language processor provides diagnostic
capabilities which detects errors

 Language Processor: It is a software which bridges the
specification or execution gap.
 Language Processing: It is any activity performed by a
language processor.
 Diagnostic capability is a feature of a language
processor. The input of a language processor is the
source program. The output of a language processor is
the target program. The target program is not produced
if the language processor finds any errors in the source
program.
TERMS

 (a) Language Translator: This bridges the execution gap to
the machine language of a computer system. Examples are
compiler and assembler.
 (b) De-translator: Similar to translator, but in the opposite
direction.
 (c) Preprocessor: This is a language processor whose source
and target languages are both high level, i.e., no translation
takes place.
 (d) Language migrator : It fills the specification gap between
two PL’s.(Used to convert program written in one programming
language in to another programming language) It may be used
to provide portability of program by migrating it to more
modern programming language
 The quality of target program is depends on semantics of two
programming languages
TYPES OF LANGUAGE PROCESSOR

 In case of problem-oriented languages. The PL
domain is very close to the application domain. The
specification gap is reduced in this case. Such PLs can
be used only for specific applications, hence they are
called problem-oriented languages.
 They have a large execution gap, but the execution
gap is bridged by the translator or interpreter. Using
these languages, we only have to do specify “what to
do”.
 Software development takes less time using problem-
oriented languages, but the resultant code may not be
optimized. Examples : Fourth generation languages
(4GL) like SQL.
PROBLEM-ORIENTED LANGUAGES:

 These provide general facilities and features which
are
required in most applications. These languages are
independent of application domains.
 Hence, there is a large specification gap. The gap
must be bridged by the application designer. Using
these languages, we have to specify “what to do”
and “how to do”.
 Examples. C, C++, FORTRAN, etc.
PROCEDURE-ORIENTED LANGUAGES:

 A compiler is a language translator. It translates a
source code (programs in a high-level language) into
the target code (machine code, or object code).
 To do this translation, a compiler steps through a
number of phases. The simplest is
 a 2-phase compiler. The first phase is called the
front end and the second phase is called the back
end.
COMPILERS

 Front End: The front end translates from the high-level
language to a common intermediate language. The front
end is source language dependent but it is machine-
independent. Thus, the front end consists of the following
phases:
 lexical analysis, syntactic analysis, creation of symbol table,
semantic analysis and generation of intermediate code. The
front end also includes error-handling routines for each of
these phases.
 Back End: The back end translates from this common
intermediate language to the machine code.
 The back end is machine dependent. This includes code
optimization, code generation, error-handling and symbol
table operations. Thus, a compiler bridges the execution gap.
COMPILERS

 It is a language processor. It also bridges the execution
gap but does not generate the machine code. An
interpreter executes a program written in a high level
language.
 The essential difference between a compiler and an
interpreter is that while a compiler generates the
machine code and is then no longer needed, an
interpreter is always required.
INTERPRETER

 Language processing activities are related to
specification gap and execution gap.
 It is divided in to two types.
 1. Program generation
 2. Program Execution
 Aim of program generation activity is to generate
automatic program. In this activity the specification
language of application domain is the source language
 A procedure oriented language is the target language.
Source language is specification language
 1. PROGRAM GENERATION ACTIVITY
 Program generator is a system software. Program
specification is input to this system software. It
generates output in target language.
LANGUAGE PROCESSING ACTIVITIES

 Here the specification gap is gap between application
domain and program generator domain.
User
Application
Domain
Program
Generator
Domain
Target
Programming
Language
Domain
Program
Execution
Domain
Specification Gap

 It reduces specification gap . and reliability of generated
program is increases. It also helps programmer for easily
writing specification of program.
 Compiler is used to bridge the gap between target PL
and the execution domain.
 2. PROGRAM EXECUTION
 Methods of program execution
 (a) Program translation
 (b) Program interpretation

 (a) Program
Translation
 It bridges the
execution gap by
translating source
program in to target
program.
 Source program is
written in to
programming
language and a
target program is an
assembly language.

 Characteristics:
 1. Before execution of program, it must be translated
 2. Translated program may be saved in to files
 3. Program must be retranslated with modifications.
 (b) PROGRAM INTERPRETATION
 It reads source program and store it in main memory.
During program interpretation it takes a source
statement and determines its meaning then it perform
the actions which to be implement on that statements.
 The action may be computational and input –output.

Program counter
increments the
memory address
for next
instruction. CPU
uses program
counter for next
instruction.
 Instruction
execution cycle
consists of
three steps
 1. Fetch 2.
Decode 3.
Execution
(b) Execution

 This cycle is repeated for all instructions. The instruction
address in the program counter is updated at the end of
the cycle CPU select the next instruction for execution.
 The above process is called interpretation cycle.
Interpretation cycle consists of :
 1. Fetch the statement
 2. Analyze the statement
 3. Execute the statement
 CHARACTERISTICS OF INTERPRETATION
1. Source program is retained in source form itself
 2. Statement is analyzed during interpretation

FUNDAMENTAL OF LANGUAGE
PROCESSING
 Language processing is the combination of analysis of SP
and synthesis of TP. specification of source program
consists of three components.
 1. Lexical rule
 2. Syntax Rule
 3. Semantic Rule

• The source program can be analyzed in three phases-
• 1. Linear-lexical Analysis : In this type of analysis the
source string is read from left to right and grouped in to
tokens.
• EX : Tokens for a language can be identifiers, constants,
relational operations, keywords.
• 2. Hierarchical(Syntax) Analysis : In this analysis,
characters or tokens are grouped hierarchically in to
nested collections for checking them syntactically.
• 3. Semantic Analysis : This kind of analysis ensures the
correctness of meaning of program.
ANALYSIS OF SOURCE PROGRAM

PROCESSING
 Synthesis phase is concerned with the construction of
target language statements which have the same
meaning as a source statement . It consists of two main
activities
 Code optimization : generation of various data structures
of target program.
 Code generation : It generates the target code.

• ANALYSIS PART
• 1. LEXICAL ANALYSIS :
• The lexical analysis is also called scanning. It is the phase
of compilation in which the complete source code is
scanned and your source program is broken up in to
group of strings called token.
• A token is a sequence of characters having a collective
meaning.
• For example if some assignment statement in your
source program is as follow:
• total =count + rate * 10
PHASES OF COMPILER

• total =count + rate * 10
• In lexical Analysis phase this statement is broken up in to
series of tokens as follow:
• 1. the identifier total
• 2. The assignment symbol
• 3. the identifier count
• 4. The plus symbol
• 5. The identifier rate
• 6. The multiplication symbol
• 7. the constant number 10
The blank characters which are used in the programming
statements are eliminated during lexical analysis.
LEXICAL ANALYSIS
Parse tree for total =count + rate * 10

• The syntax analysis is also called
parsing. In this phase the tokens
generated by the lexical analysis are
grouped together to form
hierarchical structure.
• The syntax analysis determines the
structure of source string by
grouping the tokens together.
• The hierarchical structure
generated in this phase is called
parse tree or syntax tree.
• For expression total= count + rate
*10 the parse tree will like below
2. SYNTAX ANALYSIS

• In that statement first rate *10 will be considered
because in arithmetic expression the multiplication
operator should be performed before the addition. And
then addition operation will considered.
2. SYNTAX ANALYSIS

• Once the syntax is checked in the syntax analyzer phase
the next phase (i.e. semantic analysis) determines the
meaning of source string. For example meaning of
matching parenthesis in the expression or matching of
if…else statements or performing arithmetic operations
that are type compatible or checking the scope of
variable.
 Thus the three phases are performing the task of
analysis.
 After these phases an intermediate code gets generated
3. SEMANTIC ANALYSIS

• The intermediate code is the kind of code which is very
easy to generate and this code can be easily converted to
target code.
• This code is in variety of form such as three address
code, quadruple, triple, posix.
• Intermediate code in three address form is given below
which is like an assembly language.
• The three address code consists of instructions each of
which has at the most three operands
• EX : t1 = int to float(10)
t2 = rate * t1
t3 = count + t2
total = t3
• There are certain properties which should be processed
by the three address code
4. INTERMEDIATE CODE GENERATION

• There are certain properties which should be processed
by the three address code
• 1. Each three address instruction as at most one operator
in addition to the assignment. Thus the compiler has to
decide the order of the operations devised by the three
address code.
• 2. Compiler must generate a temporary name to hold the
value computed by each instruction.
• 3. Some three address instructions may have fewer then
three operands for example first and last instruction of
above three address code.
• EX : t1 = int to real (10)
• Total = t3
4. INTERMEDIATE CODE GENERATION

• The code optimization phase attempts to improve the
intermediate code.
• This is necessary to have a faster executing code or less
consumption of memory.
• Thus by optimizing the code overall running time of the
target program can be improved.
5. CODE OPTIMIZATION

• In this generation phase the target code gets generated.
• The intermediate code instructions are translated in to
sequence of machine instructions.
• MOV rate, R1
• MUL #10.0, R1
• MOV count, R2
• ADD R1, R1
• MOV R1, total
6. CODE GENERATION

• To support phases of compiler symbol table is
maintained. The task of symbol table is to store
identifiers (variables) used in the programs.
• The symbol table also stores the information about
attributes of each identifier. The attributes of identifier
are usually it’s type, it’s scope, information about the
storage allocated for it.
• The symbol table also stores information about
subroutines used in program (In case subroutine, the
symbol table stores the name of subroutine, number of
arguments passed to it, type of these arguments, the
method of passing these arguments –either call by value
or call by reference and return type if any )
• The symbol table allows to find records for each identifier
quickly and to store or retrieve data from the record
efficiently.
SYMBOL TABLE MANAGEMENT

• During compilation lexical analyzer detects the identifier
and makes its entry in the symbol table.
• How ever lexical analyzer can not determine all the
attributes of an identifier and therefore the attributes
are entered by remaining phases of compiler.
• Various phases can use the symbol table in various ways.
EX – while doing semantic analysis the intermediate code
generation, we need to know what type of identifier are.
Then during code generation typically information about
how much storage is allocated to identifier is seen.
SYMBOL TABLE MANAGEMENT

• As programs are written by human beings therefore they
can not be free from errors.
• In compilation, each phase detects errors. These errors
must be reported to error handler whose task is to
handle the errors so that the compilation can proceed.
• Normally the errors are reported in form of messages.
When input character from the input do not form token,
the lexical analyzer detects it as error.
• Large number of errors can be detected in syntax
analysis phase. Such errors are popularly known as
syntax errors.
• During semantic analysis type mismatch kind of errors is
usually detected.
ERROR DETECTION AND HANDLING

• Input a = b + c * 60
EXAMPLE ON PROCESS OF
COMPILATION

SYMBOL TABLE ENTRIES
• Compiler/interpreter uses symbol table to achieve compile time
efficiency.
• It associates lexical names with their attributes.
• the items to be stored in symbol table are:
1) variable names
2) constants
3) procedure names
4) literal constants and strings
5) compiler generated temporaries
6) labels in source language

• Compiler uses following types of information from symbol table.
1) data type
2) Name
3) declaring procedures
4) offset in storage
5) if structure or record then pointer to structure
6) for parameter, whether parameter passing is by value or by
reference?
7) Number and type of arguments passed to function
8) base address
SYMBOL TABLE ENTRIES

1) variable names : when variable is identified, it is stored in symbol table by it’s
name. The name must be unique.
2) Constants : The constants are stored in symbol table. These constants can be
accessed by compiler with the help of pointers.
3) Data types: The data type of associated variable is stored in symbol table.
4) compiler generated temporaries : The intermediate code is generated by
compiler. During this process many temporaries may generated which are
stored in symbol table.
5) Function names : The names of functions can be stored in symbol table.
6) parameter names : The parameter that are passed to the function are stored
in symbol table. The information such as call by value or call by reference is
also stored in symbol table.
7) scope information : The scope of variable, where it can be used,
(-1) is used to store permanent symbols such as keywords
(0) is used to store global symbols
(1) is used to store symbols defined in main program
ATTRIBUTES OF SYMBOL TABLE

 Symbol table have following attributes to store the
information of data
 1. Symbol name :Symbol names are the name given to
the variable. They are of two types:
 (i) Fixed length
 (ii) Variable length
ATTRIBUTES OF SYMBOL TABLE

HOW TO STORE NAMES IN SYMBOL TABLE
• There are two types of name representation.
• 1. Fixed length name
• A fixed space for each name is allocated in symbol table. In this
type of storage if name is too small then there is a wastage of
space.
• The name can be referred by pointer to symbol table entry

CONT…
• 2. Variable length record
• Amount of space required by string is used to store names.
• The names can be stored with the help of starting index and
length of each name.
• EXAMPLE

 1. Initialize the symbol table and make all it’s entries
empty
 2. Store the symbol and it’s attribute
 3. Find a symbol
 4. Insert the new symbol
 5. delete a symbol
 6. enter scope level
OPERATIONS ON SYMBOL TABLE

PROCESSING

TERMS COMMONLY USED IN
STRINGS
TERM MEANING
Prefix of string A string obtained by removing zero or more tail
symbols.
For example for string Hindustan the prefix could be
‘Hindu’
Suffix of string A string obtained by removing zero or more leading
symbols ,For example , for string Hindustan the suffix
could be ‘dustan’
Substring A string obtained by removing prefix and suffix of a
given string is called substring. For example For string
Hindustan the srting ‘indu’ can be substring.
Sequence of
string
Any string formed by removing zero or more not
necessarily the contiguous symbols is called sequence
of string. For example Hisan can be sequence of string

OPERATIONS ON LANGUAGE
OPERATION DESCRIPTION
Union of two
languages L1 and
L2
L1 U L2 = { set of strings in L1 and strings i L2 }
Concatenation of
two languages L1
and L2
L1 . L2 = { set of strings in L1 followed by set of strings
in L2 }
Kleene closure of
L
Positive closure of
L




0
i
L
*
L
i
L
of
ions
concatenat
more
or
one
denotes
L
,
L
1
i





L
L* denotes zero or more
Concatenations of L

 The finite set which denotes a regular language
and the set which can be described by regular
expression is called regular set.
 EXAMPLE : A set of identifier is regular set because
it can be represented using regular expression.
REGULAR SET

Definition of Regular language
and regular expression over ∑
 The set R of regular language over and
∑
corresponding regular expressions are defined
as follow :
 1. ϕ is an element of R and corresponding regular
expression is ϕ
 2. { ^ } is an element of R and corresponding regular
expression is ^
 3. for each a є A, {a} is an element of R and
corresponding R.E. is a

Definition of Regular language and
regular expression over ∑
 4. if L1 and L2 are any elements of R and r1 and r2 are
it’s corresponding regular expressions then
 (a) L1 U L2 is an element of R and corresponding R.E. is (r1
+ r2)
 (b) L1L2 is an element of R and corresponding R.E. is (r1 r2)
 (c) L1 * is an element of R and corresponding R.E. is ( r1 ) *
only those language that can be obtained by statement 1-4 are
regular over ∑

 R.E. = (0+1) (0+1)
 EXAMPLE 2: regular expression for language
containing string which ends with “abb” over Σ=
{ a,b}
 R.E. = (a+b) * abb
 Example 3:Write regular expression to identify
identifier
 To denote identifier we consider a set of latters and
digits because identifier is a combination of letter
and digit but having first character as letter
always.
 R. E. = letter (letter + digit )*
EXAMPLE 1 : write a R.E. for language containing
the strings of length two over Σ= { 0,1}

 Various tool has been built for constructing lexical
analyzers using the special purpose notations called
regular expressions.
 The regular expressions are used in recognition of
tokens.
 A tool called LEX gives a special language that specifies
the tokens using regular expressions.
 The LEX file has .l extension. suppose we create one file
x.l .
 This x.l is then given to LEX compiler to produce lex.yy.c .
 This lex.yy.c is a C program which is actually a lexical
analyzer program.
 As we know that specification file stores the regular
expression for tokens, the lex.yy.c file consists of tabular
representation of transition diagrams constructed for
A language for specifying lexical
analysis

 The lexemes can be recognized by with help of
tabular transition diagrams and standard routines.
 In specification file of LEX actions are associated
with each regular expression.
 This actions are simply C code.
 This C code is directly carried out over lex.yy.c file.
 Finally C compiler compiles generated lex.yy.c and
produces an object program a.out. when some
input stream is given to a.out then sequence of
token is generated.
analysis

analysis

 The LEX program consists of three parts
 1. Declaration section
 2. Rule section
 3. Procedure section
analysis
% {
DECLARATION SECTION
%}
%%
RULE SECTION
%%
AUXILARY PROCEDURE SECTION
In declaration
section
declaration of
variables,
constants, can be
done.
Some regular
definitions can
also be written in
this section.
the regular
definitions are
basically
components of
regular

 The Rule section consists of regular expressions
associated with actions. These transition rules can be
given in form as-
 And third section auxiliary procedure section in which
all the required procedures are defined. Some times
these procedures are required by actions in the rule
section.
 the lexical analyzer and scanner works in co-ordination
of parser.
 When activated by parser , lexical analyzer begin
reading its remaining input, character by character at a
analysis
R1 { action1 }
R2 { action 2 }
.
.
.
Rn { action n}
Where each Ri is a regular expression and
each actioni is a program fragment
describing what action is to be taken for
corresponding regular expression

 When string is matched with one of the regular
expression Ri then corresponding actioni will get
executed and this actioni returns the controller to
parser.
 The repeated search for the lexeme can be made in
order to return all the tokens in source string.
 The lexical analyzer ignores the white spaces and
comments in this process.
analysis
% {
#include<stdio.h>
%}
%%
Rama | Seeta|Geeta| Neeta {
printf (“n Noun “);
}
Sings| Dances | eats {
printf(“n verb”);
}
int main()
{
yylex();
return 0; }
int yywrap()
{
return 1;
}

 The program mentioned in previous slide
recognizes noun and verb from the string clearly.
 There are three section in that program
 The section starting and ending with % { and %}
respectively is a definition section.
 The section starting with %% is called rule section.
this section is closed by %%
 within %% consists of regular expression and
actions. Rule 1 gives definition of noun and second
rule gives definition of verb.
 The third section consists of two functions the
main function and yywrap function.
 here main function calls yylex() function. yylex()
function is defined in lex.yy.c file.
analysis

 first we will compile our above program x.l using
lex compiler and then LEX compiler will
automatically generates C program named lex.yy.c.
This lex.yy.c makes use of regular expression and
corresponding actions defined in x.l.
 Hence our program x.l is called lex specification
file.
 When we compile lex.yy.c using gcc compiler as cc
lex.yy.c , we get an output file a.out – default
output file of LINUX platform and on execution of
a.out we can give input string
analysis
$ lex x.l
$ gcc lex.yy.c
$ ./a.out
This command generates lex.yy.c
This command compiles lex.yy.c (we can also use gcc in
place of cc
This command runs executable file

analysis
Rama eats
Noun
Verb
Seeta sings
Noun
Verb
After entering these commands a blank space for
entering input
gets available. Then we can give some valid input.
Then press CTRL +C or CTRL + D to come out of output.
$ lex x.l
$gcc lex.yy.c
$ ./a.out

LEX specification and features
REGULAR
EXPRESSION
MEANING
* Matching with zero or more occurrences of
preceding expression. For example, 1*
occurrence of 1 for any number of times
. Matches any single character other than new
line
[ ] A character class which matches with any
character within the bracket.
For example: [a-z] matches with any alphabet
in lower case.
( ) Group of regular expressions together put in
to a new regular expression
r[m,n] m to n occurrence of r example : a[3,5]

LEX specification and features
REGULAR
EXPRESSION
MEANING
$ Matches with the end of line as last character.
+ Matches with one or more occurrence of
preceding expression.
Example: [0-9]+ any number but not empty
string
? Matches zero or one occurrence of preceding
regular expression. For example [+-]? [0-9]+ a
number with unary operator
^ Matching the beginning of a line as first
character.
[ ^S ] Used as for negation. Any character except S.
For example [^verb] means except verb match
with anything else
Used as escape meta character

 1. BEGIN : - It indicates the start state. The lexical
analyzer starts at state 0
 2. ECHO :- It emits the input as it is
 3. yytext :- when lexer matches or recognizes the
token from input then the lexeme is stored in null
terminated string called yytext.
 as soon as new token is found the content of yytext
is replaced by new token.
 4. yylex() :- as soon as call to yylex() is encountered
scanner starts scanning the source program.
 5. yywrap() :- The function yywrap() is called when
scanner encounters end of file.The yywrap returns
0 then scanner continuous scanning. When
yywrap() returns 1 that means end of file is
LEX Actions

 6. yyin :- It is standard input file that stores input
source program.
 7. yyleng :- when lexer recognizes token then the
lexeme is stored in NULL terminated string called
yytext and yyleng stores the length of string ,so we
can say that this function is same as strlen()
 8. HOW TO WRITE main() in LEX
int main()
{
yylex();
}
LEX Actions

 9. Where to write C code ?
 we can write a valid ‘C’ code between %{ %}
 we can write any C function in a subroutine
section.
 C code for Action part for corresponding regular
expression
 10. RECOGNIZER WORKS IN FOLLOWING WAYS :
i. If more then one pattern matches then
recognizer has to choose the longest lexeme
matched
 ii. If there are two or more patterns that match the
longest lexeme, the first listed matching pattern is
choosen.

LEX Actions

 Data structures are classified on basis of following criteria
 1. Nature of Data structure : Linear or Non linear data
structure
 2. Purpose of Data Structure : Search or allocation data
structure
 3. Lifetime of data structure : Used During language
processing or during target program generation
 Linear data structure consists of linear arrangement of
elements in the memory. It requires contiguous area of
memory for its elements. This leads to wastage of memory.
 The elements of non linear data structure are accessed using
pointers. The elements need not occupy contiguous area of
memory. There is no wastage of memory. But it leads to lower
search efficiency.
SYMBOL TABLE DATA STRUCTURE FOR
LANGUAGE PROCESSING

 Search data structure are used during language
processing to maintain attribute information concerning
different entities in the source program.
 This type of data structures are characterized by using
entry for an entity is created only once but may be
searched for a large number of times. here important
point is search efficiency.
 Allocation data structure is characterized by the address
of memory area allocated to an entity known to the user
of that entity.
 In this method, search operations are not conducted. The
important points are allocation or de-allocation speed
and efficiency of memory utilization for this type of data
structures.
SYMBOL TABLE DATA STRUCTURE FOR
LANGUAGE PROCESSING

 It is a set of entries, each entry accommodating the
information concerning one entity. Each entry contains
the key field and this field is used for searching.
 ENTRY FORMATS
 set of fields are used in a search structure for each entry.
Search structure consists of two parts:
 1. Fixed parts
 2. Variant parts
 Compiler’s symbol table have the following entries
 a. Fixed part : field symbol and class
 b. Variant parts
SEARCH DATA STRUCTURES

SEARCH DATA STRUCTURES
Sr.
No
Tag Value Variant Part Fields
1 variable Type, Length, Dimension
information
2 Label Statement Number
3 Procedure Name Address of parameter list, Number
of parameters, Type of return
value, Length of returned value
 Entry format
 a. fixed length
 b. Variable length

HOW TO STORE NAMES IN SYMBOL TABLE
• There are two types of name representation.
• 1. Fixed length name
• A fixed space for each name is allocated in symbol table. In this type
of storage if name is too small then there is a wastage of space.
• The name can be referred by pointer to symbol table entry
• Benefit of using linear organizations enables the use of efficient
search procedures

CONT…
• 2. Variable length record
• Amount of space required by string is used to store names.
• The names can be stored with the help of starting index and
length of each name.
• No memory wastage in this organization
• EXAMPLE

 Hybrid entry format is used to combine the access
efficiency of fixed entry format and memory efficiency of
the variable entry format.
 In this format each entry is split in to two halves : fixed
part and variant part
 A pointer field is used in fixed part and it points to the
variable part of the entry.
HYBRID ENTRY FORMAT
Fixed part Pointer Length Variable
part
Hybrid entry format

 1. add : Add the entry to
symbol table
 2. Search : search and
locate the entry of a symbol
 3.Delete : Delete the entry
of the symbol.
 TABLE ORGANIZATON
 Table is a linear data
structure. The entries of a
table occupy adjoining area
of memory.
 Fixed length entries are
used in linear data
structures.
OPEARTION ON SEARCH DATA
STRUCTURES
#1
#2
#3
#f
#n
Occupied
Entries
Free
Entries

 Symbols used:
 n=Number of entries in table
 f = Number of occupied entries
 Operations
 1. Add a symbol : Symbol is added to the first free entry
in the table. The value of ‘f’ is updated accordingly.
 2. Delete a Symbol : Deletion can be done in two ways:
 a. Physical deletion : In physical deletion an entry is
deleted by erasing or by overwriting
 b. Logical deletion : Logical deletion of entry is
performed by adding some information to the entry to
indicate its deletion
TABLE ORGANIZATON

 1. Stack
 PROPERTIES
 1. stack is unbounded array that is treated in last in first
out(LIFO) manner. The last element stored is first one
removed.
 2. Only the last entry is accessible at any time.
 a. A stack pointer(SP) indicates the position or frame at
the top of the stack.
 b. Stack base(SB) – It points to the first word of the stack
area.
 c. Top of Stack(TOS) – It points to last entry allocated in
the stack.
 When entry is pushed on the stack, TOS is incremented
by 1. when an entry is popped , it is decremented by 1.
ALLOCATION DATA STRUCTURE

 Apart from SB and TOS, record base pointer(RB) and
reserved pointers are used in extended stack model.
 Record base pointer pointing to first word of the last
record in stack.
 Reserved pointer is the first word of each record.
EXTENDED STACK MODEL

(b) Allocation
(c) Deallocation

 Heap is non linear data structure. Heap allows the
allocation and de-allocation of entries in a random order.
 There is no specific method to access an allocated
memory location. so pointers are used to allocation and
deallocation.
HEAP

Overview of language processor course d&a

More Related Content

Similar to Overview of language processor course d&a

Recently uploaded

Overview of language processor course d&a