CSI 3125, Data Types, page 1
Data types
Outline
• Primitive data types
• Structured data types
• Strings
• Enumerated types
• Arrays
• Records
• Pointers
CSI 3125, Data Types, page 2
Primitive data types
Points
• Numeric types
• Booleans
• Characters
CSI 3125, Data Types, page 3
Data types: introduction
A data type is not just a set of objects.
We must consider all operations on
these objects. A complete definition of
a type must include a list of operations,
and their definitions.
Primitive data objects are close to
hardware, and are represented directly
(or almost directly) at the machine
level—usually word, byte, bit.
CSI 3125, Data Types, page 4
Integer types
An integer type is a finite approximation
of the infinite set of integer numbers:
{0, 1, -1, 2, -2, ...}.
Various kinds of integers:
signed—unsigned, long—short—small.
Hardware implementations of integers:
one's complement, two's complement, ...
CSI 3125, Data Types, page 5
Floating-point types
All "real" numbers in computers are finite
approximations of the non-denumerable
set of real numbers.
Precision and range of values are defined
by the language or by the programmer.
Hardware implementations (used by
floating-point processors): exponent and
mantissa.
CSI 3125, Data Types, page 6
Boolean type
This type is not supported by all languages
(for example, it is not available in PL/, C,
Perl).
The values are true and false. Operations
are as in classical two-valued propositional
logic.
Hardware implementation: a single bit or a
byte (this allows more efficient operations).
CSI 3125, Data Types, page 7
Character types
This is usually ASCII, but extended character
(Unicode, ISO) sets are often used.
Accented characters é à ü etc. should fit within
ASCII, though there is no single standard.
Chinese or Japanese are examples of writing
systems that require character sets of many
more than 256 elements.
Hardware implementation: a byte (ASCII,
EBCDIC), two bytes (Unicode) or several bytes.
CSI 3125, Data Types, page 8
Other primitive types
• word (for example, in Modula-2)
• byte, bit (for example, in PL/I)
• pointer (for example, in C)
CSI 3125, Data Types, page 9
Structured data types
Points
• Strings
• Enumerated types
• Arrays
• Records
CSI 3125, Data Types, page 10
Strings
A string is a sequence of characters. It may be:
• a special data type (its objects can be
decomposed into characters)—Fortran, Basic;
• an array of characters—Pascal, Ada;
• a list of characters—Prolog;
• consecutively stored characters—C.
The syntax is the same: characters in quotes.
Pascal has one kind of quotes, Ada has two:
'A' is a character, "A" is a string.
CSI 3125, Data Types, page 11
String operations
Typical operations on strings
string  string  string
concatenation
string  int  int  string
substring
string  characters
decompose into an array or list
characters  string
convert an array or list into a string
CSI 3125, Data Types, page 12
String operations (2)
string  integer
length
string  boolean
is it empty?
string  string  boolean
equality, ordering
CSI 3125, Data Types, page 13
More string operations…
Specialized string manipulation languages
(Snobol, Icon) include built-in pattern
matching, sometimes very complicated,
with extremely elaborate backtracking.
Another language that works very well
with strings is, of course, Perl.
CSI 3125, Data Types, page 14
Fixed- and variable-length strings
The allowed length of strings is a design
issue:
fixed-length strings—Pascal, Ada, Fortran;
variable-length strings—C, Java, Perl.
A character may be treated as a string of
length 1, or as a separate data structure.
Many languages (Pascal, Ada, C, Prolog)
treat strings as special cases of arrays or
lists.
Operations on strings are the same as on arrays or
lists of other types. For example, every character in
a character array is available at once, whereas a list
of characters must be searched linearly.
CSI 3125, Data Types, page 15
Enumerated types
Also called: user-defined ordinal types
[read Section 6.4]
We can declare a list of symbolic constants that are
to be treated literally, just like in Prolog or Scheme.
We also specify the implicit ordering of those newly
introduced symbolic constants. In Pascal:
type day =
(mo, tu, we, th, fr, sa, su);
Here, we have mo < tu < we < th < fr < sa < su.
CSI 3125, Data Types, page 16
Operators for enumerated types
Pascal supplies the programmer with three generic
operations for every new enumerated type T:
succ: successor, for example, succ(tu) = we
pred: predecessor, for example, pred(su) = sa
(each is undefined at one end)
ord: (ordinal) position in the type, starting at 0,so for
example ord(th) = 3
For characters, Pascal also has chr, producing the
character at a given position, so for example
chr(65) returns ‘A’.
CSI 3125, Data Types, page 17
Enumerated types in Ada
Ada makes these generic operations complete:
succ: successor,
pred: predecessor,
pos: position,
val: constant at position.
Ada also supplies type attributes, among them
FIRST and LAST:
day'FIRST = mo, day'LAST = su
CSI 3125, Data Types, page 18
Reuse of symbolic constants
A design issue: is the symbolic constant allowed in
more than one type? In Pascal, no. In Ada, yes:
type stoplight is
(red, amber, green);
type rainbow is
(violet, indigo, blue, green,
yellow, orange, red);
Qualified descriptions—similar to type casts—prevent
any confusion: we can write stoplight'(red) or
rainbow'(red).
CSI 3125, Data Types, page 19
Implementation of enumerated types
Map the constants c1, ..., ck into
small integers 0, ..., k-1.
Enumerated types help increase clarity
and readability of programs by separating
concepts from their numeric codes.
CSI 3125, Data Types, page 20
Arrays
An array represents a mapping:
index_type  component_type
The index type must be a discrete type (integer,
character, enumeration etc). In some languages this
type is specified implicitly:
an array of size N is indexed 0…N-1 in C++ / Java /
Perl, but in Fortran it is 1…N. In Algol, Pascal, Ada
the lower and upper bound must be both given.
There are normally few restrictions on the component
type (in some languages we can even have arrays of
procedures or files).
CSI 3125, Data Types, page 21
Multidimensional arrays
Multidimensional arrays can be defined in two
ways (for simplicity, we show only dimension 2):
index_type1  index_type2  component_type
This corresponds to references such as A[I,J].
Algol, Pascal, Ada work like this.
index_type1 (index_type2  component_type)
This corresponds to references such as A[I][J].
Java works like this.
Perl sticks to one dimension
CSI 3125, Data Types, page 22
Operations on arrays (1)
select an element (get or change its value):
A[J]
select a slice of an array:
(read the textbook, Section 6.5.7)
assign a complete array to a complete array:
A := B;
There is an implicit loop here.
CSI 3125, Data Types, page 23
Operations on arrays (2)
Compute an expression with complete arrays
(this is possible in extendible or specialized
languages, for example in Ada):
V := W + U;
If V, W, U are arrays, this may denote array
addition. All three arrays must be compatible
(the same index and component type), and
addition is probably carried out element by
element.
CSI 3125, Data Types, page 24
Subscript binding
static: fixed size, static allocation
this is done in older Fortran.
semistatic: fixed size, dynamic allocation
Pascal.
semidynamic: size determined at run time,
dynamic allocation
Ada
dynamic: size fluctuates during execution,
flexible allocation required
Algol 68, APL—both little used...
CSI 3125, Data Types, page 25
Array-type constants and initialization
Many languages allow initialization of arrays to be
specified together with declarations:
C int vector [] = {10,20,30};
Ada vector: array(0..2)
of integer := (10,20,30);
Array constants in Ada
temp is array(mo..su)of -40..40;
T: temp;
T := (15,12,18,22,22,30,22);
T := (mo=>15, we=>18, tu=>12,
sa=>30, others=>22);
T := (15,12,18, sa=>30, others=>22);
CSI 3125, Data Types, page 26
Implementing arrays (1)
The only issue is how to store arrays and access their
elements—operations on the component type decide
how the elements are manipulated.
An array is represented during execution by an array
descriptor. It tells us about:
the index type,
the component type,
the address of the array, that is, the data.
CSI 3125, Data Types, page 27
Implementing arrays (2)
Specifically, we need:
the lower and upper bound (for subscript checking),
the base address of the array,
the size of an element.
We also need the subscript—it gives us the offset (from
the base) in the memory area allocated to the array.
A multi-dimensional array will be represented by a
descriptor with more lower-upper bound pairs.
CSI 3125, Data Types, page 28
Implementing multidimensional arrays
Row major order (second subscript increases faster)
11 12 13 14 15
21 22 23 24 25
31 32 33 34 35
11 12 13 14 15 21 22 23 24 25 31 32 33 34 35
11 21 31 12 22 32 13 23 33 14 24 34 15 25 35
Column major order (first subscript increases faster)
CSI 3125, Data Types, page 29
Suppose that we have this array:
A: array [LOW1..HIGH1,
LOW2..HIGH2] of ELT;
where the size of each entity of type ELT is
SIZE.
This calculation is done for row-major
(calculations for column-major are quite
similar). We need the base—for example,
the address LOC of A[LOW1, LOW2].
Implementing multidimensional arrays (2)
CSI 3125, Data Types, page 30
We can calculate the address of A[I,J] in the
row-major order, given the base.
Let the length of each row in the array be:
ROWLENGTH = HIGH2 - LOW2 + 1
The address of A[I,J] is:
(I - LOW1) * ROWLENGTH * SIZE +
(J - LOW2) * SIZE + LOC
Implementing multidimensional arrays (3)
CSI 3125, Data Types, page 31
Here is an example.
VEC: array [1..10, 5..24] of integer;
The length of each row in the array is:
ROWLENGTH = 24 - 5 + 1 = 20
Let the base address be 1000, and let the size of
an integer be 4.
The address of VEC[i,j] is:
(i - 1) * 20 * 4 + (j - 5) * 4 + 1000
For example, VEC[7,16] is located in 4 bytes at
1524 = (7 - 1) * 20 * 4 +
(16 - 5) * 4 + 1000
Implementing multidimensional arrays (4)
CSI 3125, Data Types, page 32
Languages without arrays
A final word on arrays: they are not supported
by standard Prolog and pure Scheme. An array
can be simulated by a list, which is the basic
data structure in Scheme and a very important
data structure in Prolog.
Assume that the index type is always 1..N.
Treat a list of N elements:
[x1, x2, ..., xN] (Prolog)
(x1 x2 ... xN) (Scheme)
as the (structured) value of an array
CSI 3125, Data Types, page 33
Records
A record is a heterogeneous collection of fields
(components)—this differs from homogenous
arrays.
Records are supported by a majority of
important languages, beginning with Cobol,
through Pascal, PL/I, Ada, C (where they are
called structures), Prolog (!) to C++.
There are no records in Java, but classes
replace them. There are no records in Perl.
CSI 3125, Data Types, page 34
Ada records—syntax
type date is record
day: 1..31;
month: 1..12;
year: 1000..9999;
end record;
type person is record
name: record
fname: string(1..20);
lname: string(1..20);
end record;
born: date;
gender: (F, M);
end record;
CSI 3125, Data Types, page 35
Fields
A field is distinguished by a name rather than an
index. Iteration on elements of an array is natural
and very useful, but iteration on fields of a record is
not possible (why?).
A field is indicated by a qualified name. In Ada:
X, Y: person;
X.born.day := 15;
X.born.month := 11;
X.born.year := 1964;
Y.born := (23, 9, 1949);
Y.name.fname(1..8) := "Smithson";
CSI 3125, Data Types, page 36
Operations on records (1)
Selection of a component is done by field name.
Construction of a record from components—either from
separate fields, or as a complete record in a structured
constant.
D := (month => 10, day => 15, year => 1994);
D := (day => 15, month => 10, year => 1994);
D := (15, 10, 1994);
D := (15, 10, year => 1994);
Note that an array can also be assigned such a constant.
Interpretation depends on context.
A: array(1..3)of integer;
A := (15, 10, 1994);
CSI 3125, Data Types, page 37
Operations on records (2)
Assignment of complete records is allowed in
Ada, and is done field by field.
Records can be compared for equality or
inequality, regardless of their structure or type
of components. No generic standard ordering
of records exists, but specific ordering can be
defined by the programmer.
CSI 3125, Data Types, page 38
More on records
Ada allows default values for fields:
type date is record
day: 1..31; month: 1..12;
year: 1000..9999 := 2002;
end record;
D: date; -- D.year is now 2002
There are almost no restrictions on field types.
Any combination of records and arrays (any
depth) is usually possible. A field could also be
a file or even a procedure!
CSI 3125, Data Types, page 39
The Prolog equivalent of records
Records—or rather terms—in Prolog can carry their type and
their components around:
date(day(15), month(10), year(1994))
person(
name(fname("Jim"), lname("Berry")),
born(date(day(15), month(10), year(1994))),
gender(male)
)
If we can assure correct use, this can be simplified by dropping
one-argument "type" functors:
date(15, 10, 1994)
person(name("Jim", "Berry"),
born(date(15, 10, 1994)),
male)
CSI 3125, Data Types, page 40
Back to pointers
[Note: We’re skipping 6.9.9]
A pointer variable has addresses as values (and a
special address nil or null for "no value"). They are used
primarily to build structures with unpredictable shapes
and sizes—lists, trees, graphs—from small fragments
allocated dynamically at run time.
A pointer to a procedure is possible, but normally we
have pointers to data (simple and composite). An
address, a value and usually a type of a data item
together make up a variable. We call it an anonymous
variable: no name is bound to it. Its value is accessed
by dereferencing the pointer.
CSI 3125, Data Types, page 41
Note that, as with normal named variables, in this:
p^ := 23;
we mean the address of p^ (the value of p).
In this:
m := p^;
we mean the value of p^.
value(p) = 
value(p^) = 17
17

p 
Back to pointers (2)
Pointers in Pascal are quite well designed.
CSI 3125, Data Types, page 42
Pointer variable creation
A pointer variable is declared explicitly and has the
scope and lifetime as usual.
An anonymous variable has no scope (because it has
no name) and its lifetime is determined by the
programmer. It is created (in a special memory area
called heap) by the programmer, for example:
new(p); in Pascal
p = malloc(4); in C
and destroyed by the programmer:
dispose(p); in Pascal
free(p); in C
CSI 3125, Data Types, page 43
If an anonymous variable exists outside
the scope of the explicit pointer variable,
we have "garbage" (a lost object). If an
anonymous variable has been destroyed
inside the scope of the explicit pointer
variable, we have a dangling reference.
new(p);
p^ := 23;
dispose(p);
......
if p^ > 0 {???}
Pointer variable creation (2)
CSI 3125, Data Types, page 44
Producing garbage, an example in Pascal:
new(p); p^ := 23; new(p);
{the anonymous variable with the value 23
becomes inaccessible}
Garbage collection is the process of
reclaiming inaccessible storage. It is usually
complex and costly. It is essential in
languages whose implementation relies on
pointers: Lisp, Prolog.
Pointer variable creation (2)
CSI 3125, Data Types, page 45
Pointers: types and operators
Pointers in PL/I are typeless. In Pascal, Ada,
C they are declared as pointers to types, so
that a dereferenced pointer (p^, *p) has a
fixed type.
Operations on pointers in C are quite rich:
char b, c;
c = '007';
b = *((&c - 1) + 1);
putchar(b);

09_java_DataTypesand associated class.ppt

  • 1.
    CSI 3125, DataTypes, page 1 Data types Outline • Primitive data types • Structured data types • Strings • Enumerated types • Arrays • Records • Pointers
  • 2.
    CSI 3125, DataTypes, page 2 Primitive data types Points • Numeric types • Booleans • Characters
  • 3.
    CSI 3125, DataTypes, page 3 Data types: introduction A data type is not just a set of objects. We must consider all operations on these objects. A complete definition of a type must include a list of operations, and their definitions. Primitive data objects are close to hardware, and are represented directly (or almost directly) at the machine level—usually word, byte, bit.
  • 4.
    CSI 3125, DataTypes, page 4 Integer types An integer type is a finite approximation of the infinite set of integer numbers: {0, 1, -1, 2, -2, ...}. Various kinds of integers: signed—unsigned, long—short—small. Hardware implementations of integers: one's complement, two's complement, ...
  • 5.
    CSI 3125, DataTypes, page 5 Floating-point types All "real" numbers in computers are finite approximations of the non-denumerable set of real numbers. Precision and range of values are defined by the language or by the programmer. Hardware implementations (used by floating-point processors): exponent and mantissa.
  • 6.
    CSI 3125, DataTypes, page 6 Boolean type This type is not supported by all languages (for example, it is not available in PL/, C, Perl). The values are true and false. Operations are as in classical two-valued propositional logic. Hardware implementation: a single bit or a byte (this allows more efficient operations).
  • 7.
    CSI 3125, DataTypes, page 7 Character types This is usually ASCII, but extended character (Unicode, ISO) sets are often used. Accented characters é à ü etc. should fit within ASCII, though there is no single standard. Chinese or Japanese are examples of writing systems that require character sets of many more than 256 elements. Hardware implementation: a byte (ASCII, EBCDIC), two bytes (Unicode) or several bytes.
  • 8.
    CSI 3125, DataTypes, page 8 Other primitive types • word (for example, in Modula-2) • byte, bit (for example, in PL/I) • pointer (for example, in C)
  • 9.
    CSI 3125, DataTypes, page 9 Structured data types Points • Strings • Enumerated types • Arrays • Records
  • 10.
    CSI 3125, DataTypes, page 10 Strings A string is a sequence of characters. It may be: • a special data type (its objects can be decomposed into characters)—Fortran, Basic; • an array of characters—Pascal, Ada; • a list of characters—Prolog; • consecutively stored characters—C. The syntax is the same: characters in quotes. Pascal has one kind of quotes, Ada has two: 'A' is a character, "A" is a string.
  • 11.
    CSI 3125, DataTypes, page 11 String operations Typical operations on strings string  string  string concatenation string  int  int  string substring string  characters decompose into an array or list characters  string convert an array or list into a string
  • 12.
    CSI 3125, DataTypes, page 12 String operations (2) string  integer length string  boolean is it empty? string  string  boolean equality, ordering
  • 13.
    CSI 3125, DataTypes, page 13 More string operations… Specialized string manipulation languages (Snobol, Icon) include built-in pattern matching, sometimes very complicated, with extremely elaborate backtracking. Another language that works very well with strings is, of course, Perl.
  • 14.
    CSI 3125, DataTypes, page 14 Fixed- and variable-length strings The allowed length of strings is a design issue: fixed-length strings—Pascal, Ada, Fortran; variable-length strings—C, Java, Perl. A character may be treated as a string of length 1, or as a separate data structure. Many languages (Pascal, Ada, C, Prolog) treat strings as special cases of arrays or lists. Operations on strings are the same as on arrays or lists of other types. For example, every character in a character array is available at once, whereas a list of characters must be searched linearly.
  • 15.
    CSI 3125, DataTypes, page 15 Enumerated types Also called: user-defined ordinal types [read Section 6.4] We can declare a list of symbolic constants that are to be treated literally, just like in Prolog or Scheme. We also specify the implicit ordering of those newly introduced symbolic constants. In Pascal: type day = (mo, tu, we, th, fr, sa, su); Here, we have mo < tu < we < th < fr < sa < su.
  • 16.
    CSI 3125, DataTypes, page 16 Operators for enumerated types Pascal supplies the programmer with three generic operations for every new enumerated type T: succ: successor, for example, succ(tu) = we pred: predecessor, for example, pred(su) = sa (each is undefined at one end) ord: (ordinal) position in the type, starting at 0,so for example ord(th) = 3 For characters, Pascal also has chr, producing the character at a given position, so for example chr(65) returns ‘A’.
  • 17.
    CSI 3125, DataTypes, page 17 Enumerated types in Ada Ada makes these generic operations complete: succ: successor, pred: predecessor, pos: position, val: constant at position. Ada also supplies type attributes, among them FIRST and LAST: day'FIRST = mo, day'LAST = su
  • 18.
    CSI 3125, DataTypes, page 18 Reuse of symbolic constants A design issue: is the symbolic constant allowed in more than one type? In Pascal, no. In Ada, yes: type stoplight is (red, amber, green); type rainbow is (violet, indigo, blue, green, yellow, orange, red); Qualified descriptions—similar to type casts—prevent any confusion: we can write stoplight'(red) or rainbow'(red).
  • 19.
    CSI 3125, DataTypes, page 19 Implementation of enumerated types Map the constants c1, ..., ck into small integers 0, ..., k-1. Enumerated types help increase clarity and readability of programs by separating concepts from their numeric codes.
  • 20.
    CSI 3125, DataTypes, page 20 Arrays An array represents a mapping: index_type  component_type The index type must be a discrete type (integer, character, enumeration etc). In some languages this type is specified implicitly: an array of size N is indexed 0…N-1 in C++ / Java / Perl, but in Fortran it is 1…N. In Algol, Pascal, Ada the lower and upper bound must be both given. There are normally few restrictions on the component type (in some languages we can even have arrays of procedures or files).
  • 21.
    CSI 3125, DataTypes, page 21 Multidimensional arrays Multidimensional arrays can be defined in two ways (for simplicity, we show only dimension 2): index_type1  index_type2  component_type This corresponds to references such as A[I,J]. Algol, Pascal, Ada work like this. index_type1 (index_type2  component_type) This corresponds to references such as A[I][J]. Java works like this. Perl sticks to one dimension
  • 22.
    CSI 3125, DataTypes, page 22 Operations on arrays (1) select an element (get or change its value): A[J] select a slice of an array: (read the textbook, Section 6.5.7) assign a complete array to a complete array: A := B; There is an implicit loop here.
  • 23.
    CSI 3125, DataTypes, page 23 Operations on arrays (2) Compute an expression with complete arrays (this is possible in extendible or specialized languages, for example in Ada): V := W + U; If V, W, U are arrays, this may denote array addition. All three arrays must be compatible (the same index and component type), and addition is probably carried out element by element.
  • 24.
    CSI 3125, DataTypes, page 24 Subscript binding static: fixed size, static allocation this is done in older Fortran. semistatic: fixed size, dynamic allocation Pascal. semidynamic: size determined at run time, dynamic allocation Ada dynamic: size fluctuates during execution, flexible allocation required Algol 68, APL—both little used...
  • 25.
    CSI 3125, DataTypes, page 25 Array-type constants and initialization Many languages allow initialization of arrays to be specified together with declarations: C int vector [] = {10,20,30}; Ada vector: array(0..2) of integer := (10,20,30); Array constants in Ada temp is array(mo..su)of -40..40; T: temp; T := (15,12,18,22,22,30,22); T := (mo=>15, we=>18, tu=>12, sa=>30, others=>22); T := (15,12,18, sa=>30, others=>22);
  • 26.
    CSI 3125, DataTypes, page 26 Implementing arrays (1) The only issue is how to store arrays and access their elements—operations on the component type decide how the elements are manipulated. An array is represented during execution by an array descriptor. It tells us about: the index type, the component type, the address of the array, that is, the data.
  • 27.
    CSI 3125, DataTypes, page 27 Implementing arrays (2) Specifically, we need: the lower and upper bound (for subscript checking), the base address of the array, the size of an element. We also need the subscript—it gives us the offset (from the base) in the memory area allocated to the array. A multi-dimensional array will be represented by a descriptor with more lower-upper bound pairs.
  • 28.
    CSI 3125, DataTypes, page 28 Implementing multidimensional arrays Row major order (second subscript increases faster) 11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 11 21 31 12 22 32 13 23 33 14 24 34 15 25 35 Column major order (first subscript increases faster)
  • 29.
    CSI 3125, DataTypes, page 29 Suppose that we have this array: A: array [LOW1..HIGH1, LOW2..HIGH2] of ELT; where the size of each entity of type ELT is SIZE. This calculation is done for row-major (calculations for column-major are quite similar). We need the base—for example, the address LOC of A[LOW1, LOW2]. Implementing multidimensional arrays (2)
  • 30.
    CSI 3125, DataTypes, page 30 We can calculate the address of A[I,J] in the row-major order, given the base. Let the length of each row in the array be: ROWLENGTH = HIGH2 - LOW2 + 1 The address of A[I,J] is: (I - LOW1) * ROWLENGTH * SIZE + (J - LOW2) * SIZE + LOC Implementing multidimensional arrays (3)
  • 31.
    CSI 3125, DataTypes, page 31 Here is an example. VEC: array [1..10, 5..24] of integer; The length of each row in the array is: ROWLENGTH = 24 - 5 + 1 = 20 Let the base address be 1000, and let the size of an integer be 4. The address of VEC[i,j] is: (i - 1) * 20 * 4 + (j - 5) * 4 + 1000 For example, VEC[7,16] is located in 4 bytes at 1524 = (7 - 1) * 20 * 4 + (16 - 5) * 4 + 1000 Implementing multidimensional arrays (4)
  • 32.
    CSI 3125, DataTypes, page 32 Languages without arrays A final word on arrays: they are not supported by standard Prolog and pure Scheme. An array can be simulated by a list, which is the basic data structure in Scheme and a very important data structure in Prolog. Assume that the index type is always 1..N. Treat a list of N elements: [x1, x2, ..., xN] (Prolog) (x1 x2 ... xN) (Scheme) as the (structured) value of an array
  • 33.
    CSI 3125, DataTypes, page 33 Records A record is a heterogeneous collection of fields (components)—this differs from homogenous arrays. Records are supported by a majority of important languages, beginning with Cobol, through Pascal, PL/I, Ada, C (where they are called structures), Prolog (!) to C++. There are no records in Java, but classes replace them. There are no records in Perl.
  • 34.
    CSI 3125, DataTypes, page 34 Ada records—syntax type date is record day: 1..31; month: 1..12; year: 1000..9999; end record; type person is record name: record fname: string(1..20); lname: string(1..20); end record; born: date; gender: (F, M); end record;
  • 35.
    CSI 3125, DataTypes, page 35 Fields A field is distinguished by a name rather than an index. Iteration on elements of an array is natural and very useful, but iteration on fields of a record is not possible (why?). A field is indicated by a qualified name. In Ada: X, Y: person; X.born.day := 15; X.born.month := 11; X.born.year := 1964; Y.born := (23, 9, 1949); Y.name.fname(1..8) := "Smithson";
  • 36.
    CSI 3125, DataTypes, page 36 Operations on records (1) Selection of a component is done by field name. Construction of a record from components—either from separate fields, or as a complete record in a structured constant. D := (month => 10, day => 15, year => 1994); D := (day => 15, month => 10, year => 1994); D := (15, 10, 1994); D := (15, 10, year => 1994); Note that an array can also be assigned such a constant. Interpretation depends on context. A: array(1..3)of integer; A := (15, 10, 1994);
  • 37.
    CSI 3125, DataTypes, page 37 Operations on records (2) Assignment of complete records is allowed in Ada, and is done field by field. Records can be compared for equality or inequality, regardless of their structure or type of components. No generic standard ordering of records exists, but specific ordering can be defined by the programmer.
  • 38.
    CSI 3125, DataTypes, page 38 More on records Ada allows default values for fields: type date is record day: 1..31; month: 1..12; year: 1000..9999 := 2002; end record; D: date; -- D.year is now 2002 There are almost no restrictions on field types. Any combination of records and arrays (any depth) is usually possible. A field could also be a file or even a procedure!
  • 39.
    CSI 3125, DataTypes, page 39 The Prolog equivalent of records Records—or rather terms—in Prolog can carry their type and their components around: date(day(15), month(10), year(1994)) person( name(fname("Jim"), lname("Berry")), born(date(day(15), month(10), year(1994))), gender(male) ) If we can assure correct use, this can be simplified by dropping one-argument "type" functors: date(15, 10, 1994) person(name("Jim", "Berry"), born(date(15, 10, 1994)), male)
  • 40.
    CSI 3125, DataTypes, page 40 Back to pointers [Note: We’re skipping 6.9.9] A pointer variable has addresses as values (and a special address nil or null for "no value"). They are used primarily to build structures with unpredictable shapes and sizes—lists, trees, graphs—from small fragments allocated dynamically at run time. A pointer to a procedure is possible, but normally we have pointers to data (simple and composite). An address, a value and usually a type of a data item together make up a variable. We call it an anonymous variable: no name is bound to it. Its value is accessed by dereferencing the pointer.
  • 41.
    CSI 3125, DataTypes, page 41 Note that, as with normal named variables, in this: p^ := 23; we mean the address of p^ (the value of p). In this: m := p^; we mean the value of p^. value(p) =  value(p^) = 17 17  p  Back to pointers (2) Pointers in Pascal are quite well designed.
  • 42.
    CSI 3125, DataTypes, page 42 Pointer variable creation A pointer variable is declared explicitly and has the scope and lifetime as usual. An anonymous variable has no scope (because it has no name) and its lifetime is determined by the programmer. It is created (in a special memory area called heap) by the programmer, for example: new(p); in Pascal p = malloc(4); in C and destroyed by the programmer: dispose(p); in Pascal free(p); in C
  • 43.
    CSI 3125, DataTypes, page 43 If an anonymous variable exists outside the scope of the explicit pointer variable, we have "garbage" (a lost object). If an anonymous variable has been destroyed inside the scope of the explicit pointer variable, we have a dangling reference. new(p); p^ := 23; dispose(p); ...... if p^ > 0 {???} Pointer variable creation (2)
  • 44.
    CSI 3125, DataTypes, page 44 Producing garbage, an example in Pascal: new(p); p^ := 23; new(p); {the anonymous variable with the value 23 becomes inaccessible} Garbage collection is the process of reclaiming inaccessible storage. It is usually complex and costly. It is essential in languages whose implementation relies on pointers: Lisp, Prolog. Pointer variable creation (2)
  • 45.
    CSI 3125, DataTypes, page 45 Pointers: types and operators Pointers in PL/I are typeless. In Pascal, Ada, C they are declared as pointers to types, so that a dereferenced pointer (p^, *p) has a fixed type. Operations on pointers in C are quite rich: char b, c; c = '007'; b = *((&c - 1) + 1); putchar(b);