Variables: names, bindings, type, scope

Topics on Imperative Languages
Variables: names, bindings, type, scope Sebesta Chapter 5
Data Types Sebesta Chapter 6
Expressions Sebesta Chapter 7
Control Statements Sebesta Chapter 8
Subprograms: Concepts & Implementation Sebesta Chapters 9 & 10
Abstract Data Types Sebesta Chapter 11
Exception Handling Sebesta Chapter 14
Object Oriented languages Sebesta Chapter 12
CSC3403 Comparative Programming Languages Variables & Identifiers 1
Variable concepts
• Imperative languages map simply to von Neuman architecture
(linear memory, CPU, sequential execution)
• Variables correspond to memory cell(s)
– integers, floats (direct support for operations)
– record or structure
– (variables may be optimised to registers)
• Variable attributes
Œ name value
type lifetime
Ž location ‘ scope
• related issues: named constants, initialisation
Names
• synonym: identifiers parent(amy,cathy). parent(bill,cathy).
• applies also to names of functions, constants, types, etc.
• issues
– Maximum length
∗ limited: truncate or error if identifer too long
∗ unlimited: storage/implementation considerations
(This is really an implementation issue.)
– valid chars—typically: 0-9 a-z A-Z _~; first char non-numeric
– case sensitive?
– special words
∗ keywords — depend on context ⇒ dangerous (see next...)
∗ reserved words—e.g. if then return
∗ predefined names (system/library definitions—may be redefined)
writeln printf cout map fold
Example: Keywords considered dangerous
1962: NASA Mariner 1 Venus probe lost
• Early FORTRAN compilers implemented keywords
• Early FORTRAN compilers ignored spaces
• Programmer meant to write:
DO 5 I = 1, 3
...body of loop...
5 CONTINUE
• Problem: period instead of comma
DO 5 I = 1. 3
which is interpreted as the assignment:
DO5I = 1.3
• Result: loop not executed ⇒ program fails ⇒ Mariner 1 lost

Name space
• identifiers can be partitioned into classes: e.g. variables, types, record
names, field names, enumerated constants, module names, etc...
• where ambiguity cannot occur, only require identifiers to be unique within
their name space
• a name space is defined by the class(es) of identifier(s) it contains
• E.g. allow field names and variable names to occupy separate name spaces,
but variables and constants should occupy the same name space.
struct s { int a; int s;};
int main(){ struct s s; s.s = 1; }
• Note: Modern OO languages also implement explicit programmer
specified namespaces; this is different from the language-defined
namespaces discussed here.
Variable location
• Each variable instance has a unique start address
• It will typically occupy a range of addresses
(Very few data types can be held in just 1 byte.)
• A variable instance is defined by its
1. name
2. declaration site — which subprogram/block
3. current block instance (only required if recursive call)
Example of variable instances
There are 3 declarations of var in this program, but as many as 9 instances!
#include <stdio.h>
int var; /* 1: global variable */
int f1(){
int var; /* 2: local variable */
var = 42;
}
int f2(int var){ /* 3: parameter == local var */
if (var==0) return 1;
else return var * f2(var-1);
}
main(){
f1();
var = f2(6);
printf("%dn", var);
}
Variables Aliases
alias: 1 location but >1 name or access method
• explicit: Fortran EQUIVALENCE, C union
struct symtabEntry{ char *name;
int type;
union { int ival;
float fval;
char *sval;
} value;
}
Memory layout:
0 4 8 12
ival
name type fval
sval

Variables Aliases ....
• indirect: via pointers
int x,*p,*q;
p=&x;
q=&x;
*p=42;
printf("meaning of life = %dn", *q);
• indirect: call-by-reference parameters
void f(int *a, int *b) { *a++; *b++; }
...
f(&x, &x);
...
Comments
• Aliasing is a bad idea; reduces program readability (but variant records
can be useful in rare system programming applications)
• Better dynamic memory systems and abundant computer memory has
reduced/removed the need for aliasing
Variable type
A variable’s type defines (for language-specified types):
• the range of values that the variable can store (implementation dependent)
• the physical storage requirements (implementation dependent)
• the set of legal operations on the variable (language defined)
Variable value
• each variable instance has a value
• may be automatically initialised (see later)
• a value is contained in set of contiguous locations — called a cell
• r-value is value (contents of cell)
• l-value is address (location of cell)
Lifetime & Storage
• when is a memory cell allocated?
allocation ≡ location binding
• when is a memory cell deallocated?
• lifetime is time from allocation to deallocation (binding to unbinding)
• 4 classes of variable
– static variables
– stack-dynamic variables (automatic)
– explicit dynamic variables
– implicit dynamic variables
Static variables
• allocated before run time (usually link or load time)
• executable file contains space for static variables.
• the binding exists for entire lifetime of the program
easy cheap to access — compiler can compute the address at compile
time
handy for global variables which are used throughout program
useful for subprogram local variables (e.g. C static) when history is
required but variable is only needed in subprogram
# static parameters preclude recursion
CSC3403 Comparative Programming Languages Variables Identifiers 12

Stack-dynamic variables
• allocated when code containing declaration is executed
• elaboration or instantiation is the name of the binding process
• variables are stored on the run-time stack (RTS)
• each block or subprogram invocation allocates a chunk of storage
(activation record instance) on the RTS
• subprogram exit ⇒ storage deallocation
# disadvantage: slower access — address must be calculated at run time
advantages:
– enables recursion
– lower average memory usage
– smaller executables
– smaller libraries
Explicit dynamic variables
• explicitly created by programmer code
• not bound to variable — pointer variable holds address
struct stentry *stnode; // C++
stnode = new struct stentry; // allocate
...
delete stnode; // deallocate
• memory is allocated from the heap
• there is often a maximum heap size
ideal for dynamic structures whose size is unknown at compile time (trees,
lists, etc.)
# opportunity for programmer errors (pointers—see later)
# cost of reference (indirect via pointer: 2 memory accesses)
# cost allocation/deallocation (heap magement).
Implicit dynamic variables
• employed by dynamic languages like APL, scripting languages
• storage allocated at assignment time
advantage: flexible
# disadvantage: costly, poorer error detection
Scope
• variable scope: from where can it be referenced? (where is it visible?)
• scope rules define how to resolve references to non-local variables
(there may be 1 non-local variables of same name)
• a local variable is declared in current program unit
• each program unit introduces a new “scope”
– subprogram (a.k.a. procedure, function)
a named subprogram, usually with parameters
– block anonymous program unit
in C/C++: { .... }
– module named collection of definitions. (C++: namespace)
• two kinds
– static: compiler can resolve scope
– dynamic: scope resolved at run time

Static scoping rules
• to resolve reference to a variable
1. locate the variable’s declaration
2. retrieve variable’s attributes (type, address, etc)
• search algorithm
unit ⇐ referencing unit; found ⇐ False; done ⇐ False;
repeat
if variable declared in unit
then use this variable declaration
found ⇐ True; done ⇐ True
else if unit is not outermost
then unit ⇐ staticParent(unit)
else done ⇐ True
until done
Static scoping rules ....
Equivalent recursive definition:
A variable (say x) is visible within a referencing unit if it is either
• declared within that unit, or
• is visible from the static parent of the unit.
The referencing environment of a unit is the union of
• all identifiers declared local to the unit, and
• identifiers in the static parent’s referencing environment with names
different to those declared in this unit.
Nested scope
Program a b c d e
procedure main;
var a;
procedure sub1;
var b;
procedure sub2;
var c;
begin
end;
procedure sub3;
var d;
begin
end;
begin
end;
procedure sub4;
var e;
begin
end;
begin
end;
Hidden variables
Program xmain x1 x2 x3 x4
procedure main;
var x;
procedure sub1;
var x;
procedure sub2;
var x;
begin
end;
procedure sub3;
var x;
begin
end;
begin
end;
procedure sub4;
var x;
begin
end;
begin
end;

Static scoping issues
• Aim: to restrict access to vars/procs only to those units that need them,
thus avoiding the chance of accidental programming errors caused by
improper access.
• Non-nested function declarations (as in C) allow unrestricted calling of
functions. (With some exceptions—see later.)
• Nested declarations allow procedure calls only to
Child procedure (but not granchild etc)
# Any ancestor procedure
• problems (see discussion Sebesta §5.8.3 )
– access not rescrictive enough (ancestors)
– overuse of global variables to provide resource sharing
– key question: How can two subprograms share local variables such
that no other subprograms can see them?
modules, classes, namespaces (see later)
Case study: Scope in C
File Program x1 x2 x3 x4 f1,f3,f4 f2
p1.c int x; /* x1 */
main(){
int x; /* x2 */
x=2;
}
int f1(){
x=1;
}
p2.c static int x; /* x3 */
static int f2(){
x=3;
}
p3.c int f3(){
static int x; /* x4 */
x=4;
}
extern int x; /* x1 */
int f4(){
x=1;
}
Dynamic scope
• Apl, Snobol, early Lisp
• scope resolved at run time
• search dynamic parent chain
dynamic parent ≡ calling procedure
callee automatic inherits caller’s locals
# cost of dynamic scope resolution
# difficulty of programming / debugging
Consider proc p() {a := a + 1; }
Q: Which instance of variable “a” is being updated?
A: Cannot tell from static structure of program!
You must trace through program execution to find out.
Binding concepts
• a binding is an association
– variable attribute ←→ value of attribute
– operation ←→ symbol
– executable code ←→ physical location
– conceptual/abstract ←→ concrete/value
• binding times (from early to late)
– language design (operator symbols)
– language implementation (range of data types)
– compile time (var ←→ type)
– link time (address of vars, functions in executable image file)
– load time (physical variable addresses)
– run time (dynamic variable addresses, DLL function addresses)

Static and Dynamic Binding
• static binding: bound before run time
• dynamic binding: bound during run time
• consider variable attributes
– name is always statically determined
– value is always dynamically bound (else it is a constant!)
– location and lifetime can safely be either static or dynamic
– type and scope can be either static or dynamic.
Static binding is safer and more readable for these attributes.
Type bindings — motivation
Q: Why do we need types?
A: At reference time, a variable must have a type so that either
1. on “read” (e.g. variable access), cell contents can be interpreted correctly
2. on “write” (e.g. assignment to variable), correct conversions are made
E.g. consider
int a,b; float c;
...
c = a + b;
Key design questions
• how is the type determined?
– explicit vs. implicit declarations
• when does variable–type binding occur?
– static vs. dynamic
How is (static) type specified/determined?
• explicit declarations
– mandatory (C++, Pascal, most modern languages)
– optional ([uncommon] KR C functions default to int)
• implicit declarations
– Fortran, Basic, PL/1
– Fortran: first letter convention: I J K L M N
– others: first use convention
– disadvantage: typographical errors undetected
coutner = counter + 1
• inference (implicit declaration supported by optional explicit declarations)
– ML, Miranda, Haskell, Gofer
– all types can be statically inferred
– comparison of declared vs inferred types finds errors
Dynamic type binding
• APL, shell languages (e.g. Perl, Tcl)
• automatic conversions required
advantage: flexibility, generic subprograms possible, e.g.:
int inc(a) {return a+1;} // a can be any numeric type
# disadvantages
– typographical errors not caught (see previous example)
– cost of dynamic type inference
– interpreted implementions usually required
• Note: advanced languages (Ada, C++, Haskell) provide polymorphic
type-checked subprograms
– provides generic subprograms with static type checking

Typing concepts
• Type error A type error occurs when an operator (or function) is applied
to an operand (e.g. a variable, but in genaral any expression) whose type is
not compatible with the operator’s parameter type.
If this error is not detected by the language implementation system ⇒
program bug!
int i=1; double f,*p; p = (double *) i; *p = f;
• Type checking To check that an operator or subprogram is applied to
arguments of the correct type. (Assignment is considered a binary
operator).
• Compatible type Either a type which matches the operator’s definition or
a type which can be automatically converted, using language rules, into
the correct matching type.
• Coersion Automatic conversion of types.
Strong typing
Definition: All type errors in a program can be detected
• no imperative language is strongly typed
• typical problems: variant records, non-checked type casts
• many languages (e.g. Java, C#, Ada) are almost strongly typed
• modern functional languages are strongly typed (static typing)
• coercion weakens strong typing
Comment Strong typing is seen by most language researchers/designers as a
“good idea”, and most newer languages have very good type checking.
Type conversion examples
Consider “assignment” of a floating point to an integer.
C: coersion, type error
int i,j;
double a=999999999.5,
b=9999999999.5;
int main(){ i = a; j = b; printf(i=%d j=%dn,i,j);}
execution ⇒ i=999999999 j=-2147483648
Haskell: strong typing: explicit conversion + runtime check
Hugs :t truncate
truncate :: (RealFrac a, Integral b) = a - b
Hugs (truncate 999999999.5)::Int
999999999
Hugs (truncate 9999999999.5)::Int
Program error: arithmetic overflow
Type compatibility
• name type compatibility
– both varables appear in same declaration
or both variables declared with the same type name
– C++ uses name compatibility
main(){ //C++
struct point {int x,y;} p1,p2;
struct {int a,b;} p3;
struct point p4;
p1.x = 10; p1.y = 10;
p2=p1; // ok
p3=p1; // ERROR
p4=p1; // ok
}

Type compatibility ....
• structure type compatibility
– the type of both variables have same structure
less restrictive than name compatibility
# somewhat more complex to implement
• in general, type compability applies to expressions, not just variables
• more examples: Sebesta §5.7
Named constants
• e.g. const int x = 10; // C++
• a variable which is initialised and cannot be assigned to
• value address bound at same time
• optimisers may keep in register
advantage: readability, easy to modify code when it does not contain
“magic numbers”
• do not confuse with named literal
#define TABLESIZE 128
Initialisation
• variable can be initialised by declaration statement
• initialisation is dependant on kind of variable:
– static variable: value is stored in executable ﬁle
– dynamic variable: implicit assignment statement executed at run time
It is a shorthand rather than extra feature.
The following two C fragments are semantically identical:
void f () { void f () {
int i = 42; int i;
i = 42;
... ...
} }
• some languages guarantee to automatically initialise (e.g. integers to 0)
A few conclusions
• these features are considered “good”
static typing
strong typing
static scoping
automatic dynamic memory management (garbage collection)
• these features are considered “poor”(or use with caution!)
# aliasing
# implicit variable declaration
# unrestricted use of dynamic binding
# dynamic scoping
# dynamic type of variables
if used with care (C++ dynamic function binding)

Variables: names, bindings, type, scope

More Related Content

What's hot

Viewers also liked

Similar to Variables: names, bindings, type, scope

More from suthi

Recently uploaded

Variables: names, bindings, type, scope