SYMBOL TABLE DESIGN




9/3/2012                         1
THE STRUCTURE OF A COMPILER

• Up to this point we have treated a compiler as a single box
  that maps a source program into a semantically equivalent
  target program.



Source Program        COMPILER               Target Program




9/3/2012                                                      2
WHAT’S INSIDE THIS BOX?
• If we open up this box a little, we see that there are two
  parts to this mapping:




            ANALYSIS                   SYNTHESIS




9/3/2012                                                       3
L             S           S                  C          C
 E             Y           E
                           Y                             O
                                     INTER    O
 X             N           M
                           N                             D
                           A                  D
 I             T           T        INTER-               E
                           N        MEDIA     E
 C             A           A        MEDIATE
 A                         T        TE
               X           X
                           I        CODE
 L                                                       G
                           C        GENER-    O
      Stream       Parse             CODE                E
      of
               A                    ATOR      P
 A                 tree    A                             N
      tokens   N                              T
 N                         N                             E
               A                              I
 A                         A        GENER                R
               L                              M
 L                         L        AT -                 A
                                              I
 Y             Y           Y                             T
                                    OR        Z
 Z             Z           Z                             O
                                              E
 E             E           E                             R
                                              R
 R             R           R


   Analysis
9/3/2012                                          Synthesis   4
                               SYMBOL TABLE
ANALYSIS

Breaks up the source                                The analysis part also
                         If the analysis part
program into                                        collects information
                         detects that the
constituent pieces and                              about the source
                         source program is
imposes a                                           program and stores it
                         either syntactically ill
grammatical                                         in a data structure
                         formed or
structure on them. It                               called a symbol
                         semantically unsound,
then uses this                                      table, which is passed
                         then it must provide
structure to create an                              along with the
                         informative messages,
intermediate                                        intermediate
                         so the user can take
representation of the                               representation to the
                         corrective action.
source program.                                     synthesis part.



9/3/2012                                                                     5
SYNTHESIS
• The synthesis part constructs the desired target program
  from the intermediate representation and the information
  in the symbol table

                        • Front end of compiler
           ANALYSIS

                        • Back end of compiler
           SYNTHESIS



9/3/2012                                                     6
COMPILERS ROLE??
• An essential function of a compiler –

            Record the variable names used in the source
           program and collect information about various
                      attributes of each name.

• These attributes may provide information about the
  storage allocated for a name , its type and its scope ,
  procedure names ,number and types of its arguments, the
  method of passing each argument and the type returned



9/3/2012                                                    7
SO , WHAT EXACTLY IS SYMBOL TABLE??

        Symbol tables are data structures that are used by
       compilers to hold information about source-program
       constructs.


 A symbol table is a necessary component because
      Declaration of identifiers appears once in a program
      Use of identifiers may appear in many places of the
       program text


9/3/2012                                                      8
INFORMATION PROVIDED BY
                SYMBOL TABLE


      Given an Identifier which name is it?
      What information is to be associated with a name?
      How do we access this information?




9/3/2012                                                   9
SYMBOL TABLE - NAMES
             Variable and labels

                               Parameter

                                          Constant

           NAME                                      Record
                                          RecordField

                                   Procedure
            Array and files

9/3/2012                                                      10
SYMBOL TABLE-ATTRIBUTES
• Each piece of information associated with a name is called
  an attribute.
• Attributes are language dependent.
• Different classes of Symbols have different Attributes

             Variable,      Procedure or
                                                Array
             Constants        function
           • Type , Line   • Number of      • # of
             number          parameters,      Dimensions,
             where           parameters       Array
             declared        themselves,      bounds.
             , Lines         result type.
             where
             referenced
             , Scope
9/3/2012                                                    11
WHO CREATES SYMBOL TABLE??
 Identifiers and attributes are entered by the analysis phases
  when processing a definition (declaration) of an
  identifier
 In simple languages with only global variables and implicit
  declarations:
      The scanner can enter an identifier into a symbol table
         if it is not already there
 In block-structured languages with scopes and explicit
  declarations:
      The parser and/or semantic analyzer enter identifiers
         and corresponding attributes
9/3/2012                                                    12
USE OF SYMBOL TABLE
 • Symbol table information is used by the analysis and
   synthesis phases
 • To verify that used identifiers have been defined
   (declared)
 • To verify that expressions and assignments are
   semantically correct – type checking
 • To generate intermediate or target code




9/3/2012                                                  13
IMPLEMENTATION OF SYMBOL TABLE
• Each entry in the symbol table can be implemented as a
  record consisting of several field.
• These fields are dependent on the information to be saved
  about the name
• But since the information about a name depends on the
  usage of the name the entries in the symbol table records
  will not be uniform.
• Hence to keep the symbol tables records uniform some
  information are kept outside the symbol table and a
  pointer to this information is stored in the symbol table
  record.

9/3/2012                                                  14
a   int                       LB1

                                            UB1




            SYMBOL TABLE

 A pointer steers the symbol table to remotely stored information
 for array a.

9/3/2012                                                       15
WHERE SHOULD NAMES BE HELD??

• If there is modest upper bound on the length of the name ,
  then the name can be stored in the symbol table record
  itself.
• But If there is no such limit or the limit is already reached
  then an indirect scheme of storing name is used.
• A separate array of characters called a ‘string table’ is
  used to store the name and a pointer to the name is kept in
  the symbol table record




9/3/2012                                                     16
LB1

                    int
                               UB1




               SYMBOL TABLE




A          B
                STRING TABLE
9/3/2012                             17
SYMBOL TABLE AND SCOPE
• Symbol tables typically need to support multiple
  declarations of the same identifier within a program.

      The scope of a declaration is the portion of a program
      to which the declaration applies.


• We shall implement scopes by setting up a separate
  symbol table for each scope.




9/3/2012                                                       18
 The rules governing the scope of names in a block-
  structured language are as follows
      1. A name declared within a block B is valid only
          within B.
      2. If block B1 is nested within B2, then any name that
          any name that is valid for B2 is also valid for
          B1,unless the identifier for that name is re-declared
          in B1.
 The scope rules required a more complicated symbol table
  organization than simply a list association between names
  and attributes.
 Each table is list names and there associated attributes
  and the tables are organized into a stack.

9/3/2012                                                     19
SYMBOL TABLE ORGANIZATION

               TOP
                      z     Real
                      Y     Real        Symbol table for block q
                      x     Real

Var x,y : integer

Procedure P:
                                   q        Real
Var x,a :boolean;
                                   a        Real        Symbol table for p
Procedure q:                       x        Real
Var x,y,z : real;

begin
……                                                  P         Proc
end                         Symbol table for main   Y         Integer
begin
                                                    X         Integer
….. 9/3/2012                                                            20
End
NESTING DEPTH
• Another technique can be used to represent scope
  information in the symbol table.
• We store the nesting depth of each procedure block in the
  symbol table and use the [procedure name , nesting
  depth] pairs as the key to accessing the information from
  the table.
  A nesting depth of a procedure is a number that is obtained by starting
  with a value of one for the main and adding one to it every time we go
  from an enclosing to an enclosed procedure.

• This number is basically a count of how many procedures
  are there in the referencing environment of the procedure .
9/3/2012                                                                    21
Var x,y : integer
                    x   3    Real
Procedure P:        y   3    Real
Var x,a :boolean;
                    z   3    Real
Procedure q:        q   2    Proc
Var x,y,z : real;
                    a   2   Boolean
begin               x   2   Boolean
……
end                 P   1    Proc
begin               y   1   Integer
…..
                    z   1   integer
End


     9/3/2012                         22
SYMBOL TABLE DATA STRUCTURES
 Issues to consider : Operations required
         • Insert
             – Add symbol to symbol table
         • Look UP
             – Find symbol in the symbol table (and get its
               attributes)
  Insertion is done only once
  Look Up is done many times
  Need Fast Look Up
 The data structure should be designed to allow the
      compiler to find the record for each name quickly and to
      store or retrieve data from that record quickly.
9/3/2012                                                       23
LINKED LIST
 A linear list of records is the easiest way to implement
  symbol table.
 The new names are added to the symbol table in the order
  they arrive.
 Whenever a new name is to be added to be added it is first
  searched linearly or sequentially to check if or the name is
  already present in the table or not and if not , it is added
  accordingly.
• Time complexity – O(n)
• Advantage – less space , additions are simple
• Disadvantages - higher access time.
9/3/2012                                                     24
UNSORTED LIST
01 PROGRAM Main
02      GLOBAL a,b
03      PROCEDURE P (PARAMETER x)
04                LOCAL a
05      BEGIN {P}
                                                Look up Complexity
06                …a…                                                     On
07                …b…
08                …x…
09      END {P}
10 BEGIN{Main}
11      Call P(a)
12 END {Main}

Name         Characteristic Class       Scope            Other Attributes
                                                Declared        Referenced      Other
Main             Program            0           Line 1
a                Variable           0           Line 2             Line 11
b                Variable           0           Line 2             Line 7
P                Procedure          0           Line 3             Line 11 1, parameter, x
x                Parameter          1           Line 3             Line 8
a 9/3/2012       Variable           1           Line 4             Line 6           25
SORTED LIST
01 PROGRAM Main
02      GLOBAL a,b
03      PROCEDURE P (PARAMETER x)    Look up Complexity
04                LOCAL a                                        O log 2 n
05      BEGIN {P}
06                …a…                If stored as array (complex insertion)
07                …b…
08                …x…                Look up Complexity          On
09      END {P}
10 BEGIN{Main}
                                     If stored as linked list (easy insertion)
11      Call P(a)
12 END {Main}

Name         Characteristic Class Scope            Other Attributes
                                          Declared        Reference      Other
a              Variable          0        Line 2             Line 11
a              Variable          1        Line 4             Line 6
b              Variable          0        Line 2             Line 7
Main           Program           0        Line 1
P              Procedure         0        Line 3             Line 11 1, parameter, x
x 9/3/2012     Parameter         1        Line 3             Line 8            26
SEARCH TREES
• Efficient approach for symbol table organisation
• We add two links left and right in each record in the search
  tree.
• Whenever a name is to be added first the name is searched
  in the tree.
• If it does not exists then a record for new name is created
  and added at the proper position.
• This has alphabetical accessibility.




9/3/2012                                                    27
BINARY TREE

                       Main     Program 0       Line1


                                            P       Procedure    1   Line3   Line11



                                                x    Parameter   1 Line3     Line8

a   Variable 0   Line2 Line11


                                        b   Variable 0 Line2 Line7


a Variable 1
   9/3/2012      Line4 Line6                                                          28
BINARY TREE

  Lookup complexity if tree
  balanced                    O log 2 n


  Lookup complexity if tree
  unbalanced
                              On




9/3/2012                                  29
HASH TABLE
• Table of k pointers numbered from zero to k-1 that points
  to the symbol table and a record within the symbol table.
• To enter a name in to the symbol table we found out the
  hash value of the name by applying a suitable hash
  function.
• The hash function maps the name into an integer between
  zero and k-1 and using this value as an index in the hash
  table.




9/3/2012                                                  30
HASH TABLE - EXAMPLE
                                                              M   n     a     b    P    x
                                                              77 110 97      98    80 120


                                                                      PROGRAM Main
0      Main Program 0 Line1
                                                                      GLOBAL a,b
1
                                                                      PROCEDURE
2                                                                     P(PARAMETER x)

3                                                                     LOCAL a
                                                                      BEGIN (P)
4
                                                                      …a…
5
                                                                      …b…
6     P Procedure 1 Line 3
                                                                      …x…
7     a Variable 0 Line2
                 1 Line4             a Variable 0 Line2
                                                                      END (P)
8
                                                                      BEGIN (Main)
9     b Variable 0 1Line2
      x Parameter     Line3           b Variable 0 Line2
                                                                      Call P(a)
10
                                                                      End (Main)

    H(Id)
 9/3/2012   = (# of first letter + # of last letter) mod 11                            31
REFERENCES


            O.G.Kakde - Compiler design
            Compilers Principles, Techniques & Tools -
           Second Edition : Alfred V Aho , Monica S Lam, Ravi
           Sethi, Jeffery D Ullman




9/3/2012                                                        32
9/3/2012   33

Symbol table design (Compiler Construction)

  • 1.
  • 2.
    THE STRUCTURE OFA COMPILER • Up to this point we have treated a compiler as a single box that maps a source program into a semantically equivalent target program. Source Program COMPILER Target Program 9/3/2012 2
  • 3.
    WHAT’S INSIDE THISBOX? • If we open up this box a little, we see that there are two parts to this mapping: ANALYSIS SYNTHESIS 9/3/2012 3
  • 4.
    L S S C C E Y E Y O INTER O X N M N D A D I T T INTER- E N MEDIA E C A A MEDIATE A T TE X X I CODE L G C GENER- O Stream Parse CODE E of A ATOR P A tree A N tokens N T N N E A I A A GENER R L M L L AT - A I Y Y Y T OR Z Z Z Z O E E E E R R R R R Analysis 9/3/2012 Synthesis 4 SYMBOL TABLE
  • 5.
    ANALYSIS Breaks up thesource The analysis part also If the analysis part program into collects information detects that the constituent pieces and about the source source program is imposes a program and stores it either syntactically ill grammatical in a data structure formed or structure on them. It called a symbol semantically unsound, then uses this table, which is passed then it must provide structure to create an along with the informative messages, intermediate intermediate so the user can take representation of the representation to the corrective action. source program. synthesis part. 9/3/2012 5
  • 6.
    SYNTHESIS • The synthesispart constructs the desired target program from the intermediate representation and the information in the symbol table • Front end of compiler ANALYSIS • Back end of compiler SYNTHESIS 9/3/2012 6
  • 7.
    COMPILERS ROLE?? • Anessential function of a compiler – Record the variable names used in the source program and collect information about various attributes of each name. • These attributes may provide information about the storage allocated for a name , its type and its scope , procedure names ,number and types of its arguments, the method of passing each argument and the type returned 9/3/2012 7
  • 8.
    SO , WHATEXACTLY IS SYMBOL TABLE?? Symbol tables are data structures that are used by compilers to hold information about source-program constructs. A symbol table is a necessary component because  Declaration of identifiers appears once in a program  Use of identifiers may appear in many places of the program text 9/3/2012 8
  • 9.
    INFORMATION PROVIDED BY SYMBOL TABLE  Given an Identifier which name is it?  What information is to be associated with a name?  How do we access this information? 9/3/2012 9
  • 10.
    SYMBOL TABLE -NAMES Variable and labels Parameter Constant NAME Record RecordField Procedure Array and files 9/3/2012 10
  • 11.
    SYMBOL TABLE-ATTRIBUTES • Eachpiece of information associated with a name is called an attribute. • Attributes are language dependent. • Different classes of Symbols have different Attributes Variable, Procedure or Array Constants function • Type , Line • Number of • # of number parameters, Dimensions, where parameters Array declared themselves, bounds. , Lines result type. where referenced , Scope 9/3/2012 11
  • 12.
    WHO CREATES SYMBOLTABLE??  Identifiers and attributes are entered by the analysis phases when processing a definition (declaration) of an identifier  In simple languages with only global variables and implicit declarations:  The scanner can enter an identifier into a symbol table if it is not already there  In block-structured languages with scopes and explicit declarations:  The parser and/or semantic analyzer enter identifiers and corresponding attributes 9/3/2012 12
  • 13.
    USE OF SYMBOLTABLE • Symbol table information is used by the analysis and synthesis phases • To verify that used identifiers have been defined (declared) • To verify that expressions and assignments are semantically correct – type checking • To generate intermediate or target code 9/3/2012 13
  • 14.
    IMPLEMENTATION OF SYMBOLTABLE • Each entry in the symbol table can be implemented as a record consisting of several field. • These fields are dependent on the information to be saved about the name • But since the information about a name depends on the usage of the name the entries in the symbol table records will not be uniform. • Hence to keep the symbol tables records uniform some information are kept outside the symbol table and a pointer to this information is stored in the symbol table record. 9/3/2012 14
  • 15.
    a int LB1 UB1 SYMBOL TABLE A pointer steers the symbol table to remotely stored information for array a. 9/3/2012 15
  • 16.
    WHERE SHOULD NAMESBE HELD?? • If there is modest upper bound on the length of the name , then the name can be stored in the symbol table record itself. • But If there is no such limit or the limit is already reached then an indirect scheme of storing name is used. • A separate array of characters called a ‘string table’ is used to store the name and a pointer to the name is kept in the symbol table record 9/3/2012 16
  • 17.
    LB1 int UB1 SYMBOL TABLE A B STRING TABLE 9/3/2012 17
  • 18.
    SYMBOL TABLE ANDSCOPE • Symbol tables typically need to support multiple declarations of the same identifier within a program. The scope of a declaration is the portion of a program to which the declaration applies. • We shall implement scopes by setting up a separate symbol table for each scope. 9/3/2012 18
  • 19.
     The rulesgoverning the scope of names in a block- structured language are as follows 1. A name declared within a block B is valid only within B. 2. If block B1 is nested within B2, then any name that any name that is valid for B2 is also valid for B1,unless the identifier for that name is re-declared in B1.  The scope rules required a more complicated symbol table organization than simply a list association between names and attributes.  Each table is list names and there associated attributes and the tables are organized into a stack. 9/3/2012 19
  • 20.
    SYMBOL TABLE ORGANIZATION TOP z Real Y Real Symbol table for block q x Real Var x,y : integer Procedure P: q Real Var x,a :boolean; a Real Symbol table for p Procedure q: x Real Var x,y,z : real; begin …… P Proc end Symbol table for main Y Integer begin X Integer ….. 9/3/2012 20 End
  • 21.
    NESTING DEPTH • Anothertechnique can be used to represent scope information in the symbol table. • We store the nesting depth of each procedure block in the symbol table and use the [procedure name , nesting depth] pairs as the key to accessing the information from the table. A nesting depth of a procedure is a number that is obtained by starting with a value of one for the main and adding one to it every time we go from an enclosing to an enclosed procedure. • This number is basically a count of how many procedures are there in the referencing environment of the procedure . 9/3/2012 21
  • 22.
    Var x,y :integer x 3 Real Procedure P: y 3 Real Var x,a :boolean; z 3 Real Procedure q: q 2 Proc Var x,y,z : real; a 2 Boolean begin x 2 Boolean …… end P 1 Proc begin y 1 Integer ….. z 1 integer End 9/3/2012 22
  • 23.
    SYMBOL TABLE DATASTRUCTURES  Issues to consider : Operations required • Insert – Add symbol to symbol table • Look UP – Find symbol in the symbol table (and get its attributes)  Insertion is done only once  Look Up is done many times  Need Fast Look Up  The data structure should be designed to allow the compiler to find the record for each name quickly and to store or retrieve data from that record quickly. 9/3/2012 23
  • 24.
    LINKED LIST  Alinear list of records is the easiest way to implement symbol table.  The new names are added to the symbol table in the order they arrive.  Whenever a new name is to be added to be added it is first searched linearly or sequentially to check if or the name is already present in the table or not and if not , it is added accordingly. • Time complexity – O(n) • Advantage – less space , additions are simple • Disadvantages - higher access time. 9/3/2012 24
  • 25.
    UNSORTED LIST 01 PROGRAMMain 02 GLOBAL a,b 03 PROCEDURE P (PARAMETER x) 04 LOCAL a 05 BEGIN {P} Look up Complexity 06 …a… On 07 …b… 08 …x… 09 END {P} 10 BEGIN{Main} 11 Call P(a) 12 END {Main} Name Characteristic Class Scope Other Attributes Declared Referenced Other Main Program 0 Line 1 a Variable 0 Line 2 Line 11 b Variable 0 Line 2 Line 7 P Procedure 0 Line 3 Line 11 1, parameter, x x Parameter 1 Line 3 Line 8 a 9/3/2012 Variable 1 Line 4 Line 6 25
  • 26.
    SORTED LIST 01 PROGRAMMain 02 GLOBAL a,b 03 PROCEDURE P (PARAMETER x) Look up Complexity 04 LOCAL a O log 2 n 05 BEGIN {P} 06 …a… If stored as array (complex insertion) 07 …b… 08 …x… Look up Complexity On 09 END {P} 10 BEGIN{Main} If stored as linked list (easy insertion) 11 Call P(a) 12 END {Main} Name Characteristic Class Scope Other Attributes Declared Reference Other a Variable 0 Line 2 Line 11 a Variable 1 Line 4 Line 6 b Variable 0 Line 2 Line 7 Main Program 0 Line 1 P Procedure 0 Line 3 Line 11 1, parameter, x x 9/3/2012 Parameter 1 Line 3 Line 8 26
  • 27.
    SEARCH TREES • Efficientapproach for symbol table organisation • We add two links left and right in each record in the search tree. • Whenever a name is to be added first the name is searched in the tree. • If it does not exists then a record for new name is created and added at the proper position. • This has alphabetical accessibility. 9/3/2012 27
  • 28.
    BINARY TREE Main Program 0 Line1 P Procedure 1 Line3 Line11 x Parameter 1 Line3 Line8 a Variable 0 Line2 Line11 b Variable 0 Line2 Line7 a Variable 1 9/3/2012 Line4 Line6 28
  • 29.
    BINARY TREE Lookup complexity if tree balanced O log 2 n Lookup complexity if tree unbalanced On 9/3/2012 29
  • 30.
    HASH TABLE • Tableof k pointers numbered from zero to k-1 that points to the symbol table and a record within the symbol table. • To enter a name in to the symbol table we found out the hash value of the name by applying a suitable hash function. • The hash function maps the name into an integer between zero and k-1 and using this value as an index in the hash table. 9/3/2012 30
  • 31.
    HASH TABLE -EXAMPLE M n a b P x 77 110 97 98 80 120 PROGRAM Main 0 Main Program 0 Line1 GLOBAL a,b 1 PROCEDURE 2 P(PARAMETER x) 3 LOCAL a BEGIN (P) 4 …a… 5 …b… 6 P Procedure 1 Line 3 …x… 7 a Variable 0 Line2 1 Line4 a Variable 0 Line2 END (P) 8 BEGIN (Main) 9 b Variable 0 1Line2 x Parameter Line3 b Variable 0 Line2 Call P(a) 10 End (Main) H(Id) 9/3/2012 = (# of first letter + # of last letter) mod 11 31
  • 32.
    REFERENCES  O.G.Kakde - Compiler design  Compilers Principles, Techniques & Tools - Second Edition : Alfred V Aho , Monica S Lam, Ravi Sethi, Jeffery D Ullman 9/3/2012 32
  • 33.