Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

332 ch07


Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

332 ch07

  1. 1. ITCS332: Organization ofProgramming Languages Chapter 6 Data Types ISBN 0-321-33025-0
  2. 2. Chapter 6 Topics • Introduction • Primitive Data Types • Character String Types • User-Defined Ordinal Types • Array Types • Associative Arrays • Record Types • Union Types • Pointer and Reference Types.ITCS332 by Dr. Abdel Fattah Salman 6-2
  3. 3. Introduction• A data type defines a collection of data objects and a set of predefined operations on those objects• How well the data types match the real-world problem space; so it is crucial that a L support an appropriate variety of data types and structures.• PL/I included many data types, with intent to supporting a large range of applications.• A better approach in ALGOL68: provide a few basic types and a few flexible structure-defining operators allowing a user to design data structures for each need.• User-defined types improves readability through the use of meaningful names for types.• User-defined types aid modifiability: A user can change the type of a category of variables in a program by changing only a type declaration statement.• The fundamental idea of an abstract data type is that the use of a types is separated from the representation and set of operations on values of that type.• The 2 most common structured (nonscalar) data types are: arrays and records.ITCS332 by Dr. Abdel Fattah Salman 6-3
  4. 4. Introduction• These DT are specified by type operators or constructors: In C Ls, the type operators are: brackets, parentheses, and asterisks are used to specify arrays, functions, and pointers.• A descriptor is the collection of the attributes of a variable. In an implementation, a descriptor is a collection of memory cells that store variable attributes• If all attributes are static, descriptors are needed only at compile time. They are built by a compiler and stored in symbol table.• For dynamic attributes, part or all of the descriptor are maintained during execution.• Descriptors are used for type checking and to build the code for allocation and deallocation operations.• The word object is associated with the value of a var and space it occupies.• An object represents an instance of a user-defined (abstract data) type.• In OO Ls, every instance of every class (predefined or user-defined) is object• One design issue for all data types: What operations are defined for vars of the type and how are they specified?.ITCS332 by Dr. Abdel Fattah Salman 6-4
  5. 5. Primitive Data Types• Primitive data types: Those not defined in terms of other data types• Almost all programming languages provide a set of primitive data types – Some primitive data types are merely reflections of the hardware – Others require little non-hardware support – Primitive data types are used along with one or more type constructors to build the structured types. – Primitive data types include: integer, real, decimal, character, boolean.ITCS332 by Dr. Abdel Fattah Salman 6-5
  6. 6. Primitive Data Types: Integer• Almost always an exact reflection of the hardware so the mapping is trivial• There may be as many as eight different (in size) integer types in a language• Java’s signed integer sizes: byte, short, int, long.• C# and C++ include unsigned integer types.• Signed integers are stored in 2’s complement representation..ITCS332 by Dr. Abdel Fattah Salman 6-6
  7. 7. Primitive Data Types: Floating Point• FP types model real numbers as approximations (π and e).• Problems: Approximated representation and Loss of accuracy through arithmetic operations• Languages for scientific use support at least two floating-point types (e.g., float(4 bytes) and double(8 bytes); sometimes more• Usually exactly like the hardware, but not always• IEEE Floating-Point Standard 754• Precision is the accuracy of the fractional part of the value it bits.• Range is a combination of the exponent and fraction ranges..ITCS332 by Dr. Abdel Fattah Salman 6-7
  8. 8. Primitive Data Types: Decimal• Computers designed for business applications have hardware support for decimal data types.• Decimal types store a fixed number of decimal digits, with a decimal point at fixed position in the value.• For business applications (money) – Essential to COBOL – C# offers a decimal data type• Advantage: accuracy• Disadvantages: limited range, wastes memory.• Decimal types are stored in BCD: unpacked -one digit per byte, packed -2 digits per byte.• Operations on decimal values are done in hardware or by simulation..ITCS332 by Dr. Abdel Fattah Salman 6-8
  9. 9. Primitive Data Types: Boolean• Simplest of all• Range of values: two elements, one for “true” and one for “false”.• Boolean types are used to represent switches or flags in programs• Could be implemented as a single bit, but often as a byte – Advantage: readability.ITCS332 by Dr. Abdel Fattah Salman 6-9
  10. 10. Primitive Data Types: Character• Stored as numeric codings• Most commonly used coding: ASCII• An alternative, 16-bit coding: Unicode – Globalization of business – Computers need to communicate with other computers – Includes characters from most natural languages – The first 128 characters of Unicode are similar to those of ASCII – Originally used in Java – C# and JavaScript also support Unicode.ITCS332 by Dr. Abdel Fattah Salman 6-10
  11. 11. Character String Types• Values are sequences of characters• Design issues: – Is it a primitive type or just a special kind of array? – Should the length of strings be static or dynamic?• C and C++ define strings as array of chars and provide string operations as functions in standard library “string.h”. Strings are ASCIIZ.• Problem: Move string data do not guard against overflowing a destination. C++ programmers must use string class from standard library rather than char array.• In C# and Java, strings are supported as a primitive type by string class (constant strings) and stringbuffer class (variable strings like arrays of chars).• Typical operations on strings: – Assignment, Comparison (=, >, etc )and copying are complicated if operands have variable lengths. – Catenation – Substring reference: is a reference to a substring in a given string – Perl, JavaScirpt, and PHP include built-in Pattern matching operations based on regular expressions..ITCS332 by Dr. Abdel Fattah Salman 6-11
  12. 12. Character String Types• The pattern expression: /[A-Za-z][A-Za-zd]+/ matches typical names in PLs.• Brackets enclose character classes.• The first class specifies all letters; the second specifies all letters and digits.• The plus specifies that there must be one or more of what is in the category.• So, the whole pattern matches strings that begin with a letter followed by one or more letters or digits.• The pattern expression /d+.?d*|.d+/ ,matches numeric literals.• The . Specifies the decimal point; the ? Quantifies what it follows to have zero or one appearance; The | separates 2 alternatives in the whole pattern:• The first pattern matches strings of one or more digits, possibly followed by decimal point, followed by zero or more digits;• The second alternative matches strings that begin with a decimal point followed by one or more digits..ITCS332 by Dr. Abdel Fattah Salman 6-12
  13. 13. Character String Type in Certain Languages• C and C++ – Not primitive – Use char arrays and a library of functions that provide operations• SNOBOL4 (a string manipulation language) – Primitive – Many operations, including elaborate pattern matching• Java – Primitive via the String class.ITCS332 by Dr. Abdel Fattah Salman 6-13
  14. 14. Character String Length OptionsThere are several design options regarding the string length:• Static length string: the length is set when it is created as in COBOL and Java’s String class• Limited Dynamic Length: strings of varying length up to a fixed maximum defined by var’s definition as in C and C++ – In C-based language, a special character is used to indicate the end of a string’s characters, rather than maintaining the length• Dynamic Length strings: strings of varying length with no maximum as in SNOBOL4, Perl, JavaScript• Ada supports all three string length options: – Type string from the standard package. – Type bounded_string from the Ada.Strings.Bounded package. – Type Unbounded_string from the Ada.Strings.Unbounded package• Character String Type Evaluation – Aid to writeability – Dealing with strings as arrays can be more cumbersome than dealing with primitive string type. As a primitive type with static length, they are inexpensive to provide-- why not have them? (Providing string through standard library is like primitive strings). – Dynamic length is nice and flexible, but is it worth the expense?.ITCS332 by Dr. Abdel Fattah Salman 6-14
  15. 15. Character String Implementation• String types could be supported in hardware but in most cases software is used to implement string storage, retrieval, and manipulation.• Static length: compile-time descriptor with 3 fields: – Name of the type – Type’s length in character – Address of the first character• Limited dynamic length: may need a run-time descriptor to store both the maximum and the current lengths (C and C++ do not require limited dynamic descriptor because the string is terminated with null).• Dynamic length: need run-time descriptor to store only the current length;• All descriptors are stored in symbol table...ITCS332 by Dr. Abdel Fattah Salman 6-15
  16. 16. Compile- and Run-Time Descriptors• Allocation/de-allocation is the biggest implementation problem: The storage must grow and shrink as needed. There are 2 approaches: String can be store in a linked list – extra storage occupied by the links and necessary complexity of string operations, and simple allocation and deallocation. Using adjacent memory cells to store a complete string – requires less storage and faster string operations, but allocation and deallocation are slower. Compile-time descriptor Run-time descriptor for for static strings limited dynamic strings.ITCS332 by Dr. Abdel Fattah Salman 6-16
  17. 17. User-Defined Ordinal Types• An ordinal type is one in which the range of possible values can be easily associated with the set of positive integers• Examples of primitive ordinal types in Java – integer – char – boolean.ITCS332 by Dr. Abdel Fattah Salman 6-17
  18. 18. Enumeration Types• An enumeration type is one in which all possible values, which are named constants, are provided in the definition.• Enumeration types provide a way of defining and grouping collections of named constants.• Enumeration constants are implicitly assigned integers: 0,1, … and can be explicitly assigned any integer in the definition.• An example in C# : enum days {mon, tue, wed, thu, fri, sat, sun};• Design issues: All are related to type checking – Is an enumeration constant allowed to appear in more than one type definition, and if so, how is the type of an occurrence of that constant checked? – Are enumeration values coerced to integer? – Any other type coerced to an enumeration type?.ITCS332 by Dr. Abdel Fattah Salman 6-18
  19. 19. Enumeration Types• In Ls that do not have enumeration types, programmers simulate them with integer values as in FORTRAN77: we use 0 to represent blue, 1 to represent red, … integer red, blue data red, blue /0, 1/• The problem with this approach is that because we have not defined a type for our colors, there is no type checking when they are used. enum colors {red, blue, green, yellow, black}; colors mycolor= blue, yourColor= red;• The enumeration values are coerced to int when they are put in an integer context. For example, if the current value of mycolor is blue, then mycolor++ would assign green to mycolor.• C++ enumeration constants can appear in only ONE enumeration type in the same referencing environment.• C# enumeration types are like C++, except that they never coerced to integers..ITCS332 by Dr. Abdel Fattah Salman 6-19
  20. 20. Evaluation of Enumerated Type• Aid to readability: Named values are easily recognized, whereas coded values are not - e.g., no need to code a color as a number• Aid to reliability, e.g., compiler can check: –No arithmetic operations are legal on enumeration types. (don’t allow colors to be added) –No enumeration variable can be assigned a value outside its defined range. –In C++: Numeric values can be assigned to enumeration type variables only if they are cast to the type of the assigned variables . Numeric values are assigned to enumeration type variables are checked to determine whether they are in the range of the internal values of the enumeration type. –Ada, C#, and Java 5.0 provide better support for enumeration than C++ because enumeration type variables in these languages are not coerced into integer types.ITCS332 by Dr. Abdel Fattah Salman 6-20
  21. 21. Subrange Types• A Subrange Type is an ordered contiguous subsequence of an ordinal type. For example: 12..18 is a subrange of integer type• Ada’s design: – Subranges are part of subtypes. Subtypes are not new types, but only new names for restricted versions of existing types. type Days is (mon, tue, wed, thu, fri, sat, sun); subtype Weekdays is Days range mon..fri; subtype Index is Integer range 1..100; – The restriction on the existing type is in the range of possible values. All operations defined for parent type are also defined for the subtype. Day1: Days; Day2: Weekday; Day2 := Day1;.ITCS332 by Dr. Abdel Fattah Salman 6-21
  22. 22. Subrange Types – The compiler must generate range-checking code for every assignment to a subrange variable. – Types are checked for compatibility at compile time and range checking is done at run time. – Common uses of user-defined ordinal types: indexes of arrays and loop vars. – Subrange types are different from Ada’s derived types: Type derived_small_int is new integer range 1..100; Subtype subrange_small_int is integer range 1..100; – Vars of both types inherit the value range and operations of integer. – Variables of derived_small_int are not compatible with any integer type. – Variables of type subrange_small_int are compatible with variables and constants of integer type and any subtype of integer..ITCS332 by Dr. Abdel Fattah Salman 6-22
  23. 23. Subrange Evaluation• Aid to readability: Make it clear to the readers that variables of subrange can store only certain range of values• Reliability: Assigning a value to a subrange variable that is outside the specified range is detected as an error by the compiler or by the run-time system.Implementation of User-Defined Ordinal Types• Enumeration types are implemented as integers• Subrange types are implemented like the parent types with code inserted (by the compiler) to restrict assignments to subrange variables. – This step increases code size and execution time but is usually considered well worth the cost..ITCS332 by Dr. Abdel Fattah Salman 6-23
  24. 24. Array Types• An array is an aggregate of homogeneous data elements in which an individual element is identified by its position in the aggregate, relative to the first element.• A reference to an array element needed one or more subscripts which require a run-time calculation to determine the mem location being referenced.• Array Design Issues: – What types are legal for subscripts? – Are subscripting expressions in element references range checked? – When are subscript ranges bound? – When does allocation take place? – What is the maximum number of subscripts? – Can array objects be initialized? – Are any kind of slices allowed? .ITCS332 by Dr. Abdel Fattah Salman 6-24
  25. 25. Array Indexing• Specific elements are determined by means of a two-level mechanism: the first part is the aggregate name, the second part is a dynamic selector consisting of one or more items known as subscripts or indexes.• If all subscripts in a reference are constants – the selector is static, otherwise it is dynamic.• Indexing (or subscripting) is a mapping from indices to elements array_name(index_value_list)→ an element• Index Syntax – FORTRAN, PL/I, Ada use parentheses • Ada explicitly uses parentheses to show uniformity between array references and function calls because both are mappings – Most other languages use brackets – In Ls that provide multidimensional arrays as array of arrays, each subscript appears in its own bracket..ITCS332 by Dr. Abdel Fattah Salman 6-25
  26. 26. Arrays Index (Subscript) Types• A problem with using parentheses to enclose subscripts and subprogram parameters. Context information is used to solve it.• Array element references map subscripts to specific array element• Function calls map parameters to functional values.• List(59) may be a reference to array element or a call to a function named list.• FORTRAN, C: integer only• Pascal: any ordinal type (integer, Boolean, char, enumeration)• Ada: integer or enumeration (includes Boolean and char)• Java: integer types only• C, C++, Perl, and Fortran do not specify range checking• Java, ML, C# specify range checking.ITCS332 by Dr. Abdel Fattah Salman 6-26
  27. 27. Subscript Binding and Array Categories• There are 5 categories of arrays: category definition is based on range subscript binding and binding to storage• The category name indicate where and when storage is allocated.• A Static array is one in which the subscript ranges are statically bound and storage allocation is static (before run-time) – Advantage: efficiency (no dynamic allocation)• A Fixed stack-dynamic array: is one in which subscript ranges are statically bound, but the allocation is done at declaration elaboration time during execution. – Advantage: space efficiency; A large array in one subprogram can use the same space in another subprogram (The 2 subprograms are not active at the same time)..ITCS332 by Dr. Abdel Fattah Salman 6-27
  28. 28. Subscript Binding and Array Categories (continued)• A Stack-dynamic array is one in which subscript ranges are dynamically bound and the storage allocation is dynamic (done at run-time). Once the subscript ranges are bound and storage is allocated, they remain fixed during the lifetime of the var. – Advantage: flexibility (the size of an array need not be known until the array is to be used).• A Fixed heap-dynamic array: similar to fixed stack-dynamic: storage binding is dynamic but fixed after allocation (i.e., binding is done when requested and storage is allocated from heap, not stack).• A Heap-dynamic array is one in which binding of subscript ranges and storage allocation is dynamic and can change any number of times during the arrays lifetime. – Advantage: flexibility (arrays can grow or shrink during program execution)..ITCS332 by Dr. Abdel Fattah Salman 6-28
  29. 29. Subscript Binding and Array Categories (continued)• C and C++ arrays that include static modifier are static• C and C++ arrays without static modifier are fixed stack-dynamic• C and C++ provide fixed heap-dynamic arrays (By using operators: new and delete).• Ada arrays can be stack-dynamic• In Java all arrays are heap-dynamic arrays.• C# provides fixed heap-dynamic arrays and includes a second array class ArrayList that provides heap-dynamic: Objects of this class are created without any elements and added to this object using the add method.• Perl and JavaScript support heap-dynamic arrays: array can grow and shrink: In Perl we create array of 5 elements with @list = (1,2,3,5,7).• It can be lengthened with push function as push(@list, 11, 19);• The arrays can be emptied with @list = ();.ITCS332 by Dr. Abdel Fattah Salman 6-29
  30. 30. Array Initialization• Some language allow initialization at the time of storage allocation – C, C++, Java, C# example int list [] = {4, 5, 7, 83} – Character strings in C and C++ char name [] = “freddie”; – Arrays of strings in C and C++ char *names [] = {“Bob”, “Jake”, “Joe”]; – Java initialization of String objects string[] names = {“Bob”, “Jake”, “Joe”};List:array(1..5)of integer:=(1,3,5,7,9); initializes all elementsBunch:array(1..5)of integer:=(1=>17,3=>35,others =>0) – The first and third elements are initialized using direct assignment and others clause initializes the remaining elements. .ITCS332 by Dr. Abdel Fattah Salman 6-30
  31. 31. Arrays Operations• An array operation is one that operates on an array as a unit.• Ada allows array assignment and also concatenation (&). Concatenation is defined between 2 single-dimensional arrays and between a single-dimensional array and a scalar.• Fortran provides elemental operations because they are between pairs of array elements – For example, + operator between two arrays results in an array of the sums of the element pairs of the two arrays – Library functions for matrix multiplication, transpose, dot product,…• APL provides the most powerful array processing operations for vectors and matrixes as well as unary operators (for example, to reverse column elements). See examples of APL array operations on page 272..ITCS332 by Dr. Abdel Fattah Salman 6-31
  32. 32. Rectangular and Jagged Arrays• A rectangular array is a multi-dimensioned array in which all of the rows have the same number of elements and all columns have the same number of elements. All subscripts are placed in a single pair of brackets.• A jagged array has rows (columns) with varying number of elements. The use separate pair of brackets for each dimension a[6][5]. – Possible when multi-dimensioned arrays actually appear as arrays of arrays.ITCS332 by Dr. Abdel Fattah Salman 6-32
  33. 33. Slices• A slice is some substructure of an array; nothing more than a referencing mechanism• Slices are only useful in languages that have array operations• Slice Examples: – Fortran 95 Integer, Dimension (10) :: Vector Integer, Dimension (3, 3) :: Mat Integer, Dimension (3, 3) :: Cube Vector (3:6) is a four element array.ITCS332 by Dr. Abdel Fattah Salman 6-33
  34. 34. Slices Examples in Fortran 95.ITCS332 by Dr. Abdel Fattah Salman 6-34
  35. 35. Implementation of Arrays• Implementing arrays requires more compile time effort than does implementing simple types (int).• The code to access array element must be generated at compile time.• Access function maps subscript expressions to an address in the array• Access function for single-dimensioned arrays: address(list[k])= address(list[lower_bound]) + ((k-lower_bound)* element_size)• The compile-time descriptor for single-dimensioned arrays includes information needed to construct access function.• If all attributes are static and index range checking in not done at run-time, no descriptor is needed..ITCS332 by Dr. Abdel Fattah Salman 6-35
  36. 36. Accessing Multi-dimensioned Arrays• Multidimensional arrays are more complex to implement than single- dimensioned arrays .• Memory is linear – a simple sequence of bytes.• Values of data types that have 2 or more dimensions must be mapped onto the single-dimensioned memory.• Two common ways to store a multidimensional array : – Row major order (by rows) – used in most languages – Column major order (by columns) – used in Fortran• Sequential access to matrix elements will be faster if they are accessed in the order in which they are stored – minimizing paging.• The access function for a multidimensional array is the mapping of its base address and a set of index values to the address in memory of the element specified by the index values.• The access function for a 2-dimensional array stored in row-major order is shown below:.ITCS332 by Dr. Abdel Fattah Salman 6-36
  37. 37. Row / Column major ordering 11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 Row major order (second subscript increases faster) 11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 Column major order (first subscript increases faster) 11 21 31 12 22 32 13 23 33 14 24 34 15 25 35.ITCS332 by Dr. Abdel Fattah Salman 6-37
  38. 38. Locating an Element in a Multi-dimensioned Array• The address of an element is the base address of the array plus the element size times the number of elements preceding it in the array.Loc (a[i, j])= address of a[1,1] + (# of elements preceding it ) * el_size = address of a[1,1] + ((number of rows above ith row * row_size) + number of elements left of jth column) * el_size = address of a[1,1] + (((i-1)*n + (j-1)) * el_size = address of a[1,1] + (i*n-n+j-1) * el_size = address of a[1,1] + ((i*n+j)-(n+1)) * el_size = address of a[1,1] + ((i*n+j)*el_size-(n+1) * el_size = address of a[1,1] -(n+1)*el_size +(i*n+j)* el_size (i-1)*element size (j-1)*element size.ITCS332 by Dr. Abdel Fattah Salman 6-38
  39. 39. Locating an Element in a Multi-dimensioned Array • The address of an element is the base address of the array plus the element size times the number of elements preceding it in the array. Location(a[i,j])= address of a[1,1]+ ((number of rows above ith row * row_size) + number of elements left of jth column) * element_size • General format: Location(a[i,j])=address of a[row_lb , col_lb]- (((row_lb * n)+ col_lb)* element_size)+ (((I * n) + j) * element_size)• The first 2 terms are the constant part and the last is the variable part.• For each dimension on an array, ONE add and ONE multiply instructions are required for the access function. .ITCS332 by Dr. Abdel Fattah Salman 6-39
  40. 40. Compile-Time Descriptors Single-dimensioned array Multi-dimensional array.ITCS332 by Dr. Abdel Fattah Salman 6-40
  41. 41. Associative Arrays• An associative array is an unordered collection of data elements that are indexed by an equal number of values called keys – User defined keys must be stored in the structure – In nonassociative arrays: the indices never need to be stored – Each element of an associative array is a pair of entities: a key and a value.• Design issues: What is the form of references to elements?• In Perl, associative arrays are called hashes – their elements are stored and retrieved with hash functions. Every hash variable must begin with %. Scalar variable begin with $. The key value is placed in braces and the hash name is replaced by a scalar variable name that is the same except for the first character..ITCS332 by Dr. Abdel Fattah Salman 6-41
  42. 42. Associative Arrays in Perl• Names begin with %; literals are delimited by parentheses %hi_temps=("Mon"=>77, "Tue" => 79, “Wed” => 65, …);• Subscripting is done using braces and keys $hi_temps{"Wed"} = 83; – Elements can be removed with delete delete $hi_temps{"Tue"}; The entire hash can be emptied by assigning an empty literal to it: @salaries = ();.ITCS332 by Dr. Abdel Fattah Salman 6-42
  43. 43. Perl’s Associative Arrays• Perl has a primitive datatype for hash tables aka “associative arrays”.• Elements indexed not by consecutive integers but by arbitrary keys• %ages refers to an associative array and @people to a regular array• Note the use of { } for associative arrays and [ ] for regular arrays %ages = (“Bill Clinton”=>53,”Hillary”=>51, "Socks“=>"27 in cat years"); $ages{“Hillary”} = 52; @people=("Bill Clinton“,"Hillary“,"Socks“); $ages{“Bill Clinton"}; # Returns 53 $people[1]; # returns “Hillary”• keys(X), values (X) and each(X) foreach $person (keys(%ages)) {print "I know the age of $personn";} foreach $age (values(%ages)){print "Somebody is $agen";} while (($person, $age) = each(%ages)) {print "$person is $agen";} .ITCS332 by Dr. Abdel Fattah Salman 6-43
  44. 44. Record Types • A record is a possibly heterogeneous aggregate of data elements in which the individual elements are identified by names • Design issues: – What is the syntactic form of references to the field? – Are elliptical references allowed?.ITCS332 by Dr. Abdel Fattah Salman 6-44
  45. 45. Definition of Records• COBOL uses level numbers to show nested records; others use recursive definition dot notation• Record Field References – COBOL field_name OF record_name_1 OF ... OF record_name_n – Others () record_name_1. record_name_2. ... record_name_n. field_name• COBOL uses level numbers to show nested records; others use recursive definition 01 EMP-REC. 02 EMP-NAME. 05 FIRST PIC X(20). 05 MID PIC X(10). 05 LAST PIC X(20). 02 HOURLY-RATE PIC 99V99..ITCS332 by Dr. Abdel Fattah Salman 6-45
  46. 46. Definition of Records in Ada • Record structures are indicated in an orthogonal way type Emp_Rec_Type is record First: String (1..20); Mid: String (1..10); Last: String (1..20); Hourly_Rate: Float; end record; Emp_Rec: Emp_Rec_Type;.ITCS332 by Dr. Abdel Fattah Salman 6-46
  47. 47. References to Records• Most language use dot notation: Emp_Rec.Name• Fully qualified references must include all record names• Elliptical references allow leaving out record names as long as the reference is unambiguous, for example in COBOL FIRST, FIRST OF EMP-NAME, and FIRST of EMP-REC are elliptical references to the employee’s first name.Operations on Records• Assignment is very common if the types are identical• Ada allows record comparison• Ada records can be initialized with aggregate literals• COBOL provides MOVE CORRESPONDING – Copies a field of the source record to the corresponding field in the target record.ITCS332 by Dr. Abdel Fattah Salman 6-47
  48. 48. Evaluation and Comparison to Arrays• Straight forward and safe design• Records are used when collection of data values is heterogeneous• Access to array elements is much slower than access to record fields, because subscripts are dynamic (field names are static)• Dynamic subscripts could be used with record field access, but it would disallow type checking and it would be much slower.ITCS332 by Dr. Abdel Fattah Salman 6-48
  49. 49. Implementation of Record Type Offset address relative to the beginning of the records is associated with each field.ITCS332 by Dr. Abdel Fattah Salman 6-49
  50. 50. Unions Types • A union is a type whose variables are allowed to store different type values at different times during execution • Design issues – Should type checking be required? – Should unions be embedded in records? Discriminated vs. Free Unions • Fortran, C, and C++ provide union constructs in which there is no language support for type checking; the union in these languages is called free union • Type checking of unions require that each union include a type indicator called a discriminant – Supported by Ada.ITCS332 by Dr. Abdel Fattah Salman 6-50
  51. 51. Ada Union Typestype Shape is (Circle, Triangle, Rectangle);type Colors is (Red, Green, Blue);type Figure (Form: Shape) is record Filled: Boolean; Color: Colors; case Form is when Circle => Diameter: Float; when Triangle => Leftside, Rightside: Integer; Angle: Float; when Rectangle => Side1, Side2: Integer; end case;end record;.ITCS332 by Dr. Abdel Fattah Salman 6-51
  52. 52. Ada Union Type IllustratedA discriminated union of three shape variablesEvaluation of Unions• Potentially unsafe construct: Do not allow type checking• Java and C# do not support unions: Reflective of growing concerns for safety in programming language .ITCS332 by Dr. Abdel Fattah Salman 6-52
  53. 53. Pointer and Reference Types• A pointer type variable has a range of values consisting of memory addresses and a special value, NULL (nil) – Provide the power of indirect addressing – Provide a way to manage dynamic memory – A pointer can be used to access a location in the area where storage is dynamically created (usually called a heap)• Design Issues of Pointers – What are the scope of and lifetime of a pointer variable? – What is the lifetime of a heap-dynamic variable? – Are pointers restricted as to the type of value to which they can point? – Are pointers used for dynamic storage management, indirect addressing, or both? – Should the language support pointer types, reference types, or both?.ITCS332 by Dr. Abdel Fattah Salman 6-53
  54. 54. Pointer Operations• Two fundamental operations: assignment and dereferencing• Assignment is used to set a pointer variable’s value to some useful address• Dereferencing yields the value stored at the location represented by the pointer’s value – Dereferencing can be explicit or implicit – C++ uses an explicit operation via * j = *ptr sets j to the value located at ptr.ITCS332 by Dr. Abdel Fattah Salman 6-54
  55. 55. Pointer Assignment IllustratedThe assignment operation j = *ptr.ITCS332 by Dr. Abdel Fattah Salman 6-55
  56. 56. Problems with Pointers• Dangling pointers (dangerous) – Dangling Pointer is when dynamic memory has been deallocated (deleted) but there is one or more pointers still pointing to it. A pointer points to a heap-dynamic variable that has been de-allocated – Creating one: • Allocate a heap-dynamic variable and set a pointer to point at it • Set a second pointer to the value of the first pointer • Deallocate the heap-dynamic variable, using the first pointer – Example: int *myPtr,*urPtr; myPtr = new int(10); cout << "The value of myPtr is " << *myPtr << endl; urPtr = myPtr; delete myPtr; // urPtr is a “dangling pointer” *myPtr = 5; cout << "The value of myPtr is " << *myPtr << endl;• It is an error to dereference a pointer after deleting any of its aliases. This creates “dangling pointers” .ITCS332 by Dr. Abdel Fattah Salman 6-56
  57. 57. Problems with Pointers• Lost heap-dynamic variable – An allocated heap-dynamic variable that is no longer accessible to the user program (often called garbage) – Creating one: • Pointer p1 is set to point to a newly created heap-dynamic variable • Pointer p1 is later set to point to another newly created heap- dynamic variable. This causes losing the first heap-dynamic variable, i.e. that variable cannot be accessed or deallocated. • Example: void *p1,*p2; p1 = new int(10); p1=new float (7.4); //The int var(=10) is lost• The process of losing heap-dynamic variables is called memory leakage.ITCS332 by Dr. Abdel Fattah Salman 6-57
  58. 58. Pointers in Ada• Some dangling pointers are disallowed because dynamic objects can be automatically de-allocated at the end of pointers type scope• All pointers are initialized to null• The lost heap-dynamic variable problem is not eliminated by Ada.ITCS332 by Dr. Abdel Fattah Salman 6-58
  59. 59. Pointers in C and C++• Extremely flexible but must be used with care• Pointers can point at any variable regardless of when it was allocated• Used for dynamic storage management and addressing• Pointer arithmetic is possible• Explicit dereferencing and address-of operators• Domain type need not be fixed (void *) float stuff[100]; float *p; p = stuff; *(p+5) is equivalent to stuff[5] and p[5] *(p+i) is equivalent to stuff[i] and p[i]• void * can point to any type and can be type checked (cannot be de- referenced).ITCS332 by Dr. Abdel Fattah Salman 6-59
  60. 60. Pointers in Fortran 95• Pointers point to heap and non-heap variables• Implicit dereferencing• Pointers can only point to variables that have the TARGET attribute• The TARGET attribute is assigned in the declaration: REAL, POINTER :: ptr (POINTER is an attribute) ptr => target (where target is either a pointer or a non-pointer with the TARGET attribute) The TARGET attribute is assigned in the declaration, e.g. INTEGER, TARGET :: NODE.ITCS332 by Dr. Abdel Fattah Salman 6-60
  61. 61. Reference Types• C++ includes a special kind of pointer type called a reference type that is used primarily for formal parameters – Advantages of both pass-by-reference and pass-by-value• Java extends C++’s reference variables and allows them to replace pointers entirely – References refer to call instances• C# includes both the references of Java and the pointers of C++.ITCS332 by Dr. Abdel Fattah Salman 6-61
  62. 62. Evaluation of Pointers• Dangling pointers and dangling objects are problems as is heap management• Pointers are like gotos--they widen the range of cells that can be accessed by a variable• Pointers or references are necessary for dynamic data structures--so we cant design a language without them.Representations of Pointers• Large computers use single values• Intel microprocessors use segment and offset.ITCS332 by Dr. Abdel Fattah Salman 6-62
  63. 63. Solving Dangling Pointer Problem• Tombstone: extra heap cell that is a pointer to the heap- dynamic variable – The actual pointer variable points only at tombstones – When heap-dynamic variable is de-allocated, tombstone remains but set to nil – Costly in time and space• Locks-and-keys: Pointer values are represented as (key, address) pairs – Heap-dynamic variables are represented as variable plus cell for integer lock value. – When heap-dynamic variable is allocated, lock value is created and placed in lock cell and key cell of pointer..ITCS332 by Dr. Abdel Fattah Salman 6-63
  64. 64. Heap Management• Memory management: identify unused, dynamically allocated memory cells and return them to the heap• Approaches – Manual: explicit allocation and deallocation (C, C++) – Automatic: • Reference counters (modula2, Adobe Photoshop) • Garbage collection (Lisp, Java)• Problems with manual approach: – Requires programmer effort – Programmer’s failures leads to space leaks and dangling references/sharing – Proper explicit memory management is difficult and has been estimated to account for up to 40% of development time!• A very complex run-time process• Single-size cells vs. variable-size cells• Two approaches to reclaim garbage – Reference counters (eager approach): reclamation is gradual – Garbage collection (lazy approach): reclamation occurs when the list of available space becomes empty.ITCS332 by Dr. Abdel Fattah Salman 6-64
  65. 65. Reference Counter• Idea: keep track how many references there are to a cell in memory. If this number drops to 0, the cell is garbage.• Reference counters: maintain a counter in every cell that store the number of pointers currently pointing at the cell• Store garbage in free list; allocate from this list• Advantages – resources can be freed directly – immediate reuse of memory possible• Disadvantages – Can’t handle cyclic data structures – Bad locality properties – Large overhead for pointer manipulation – Disadvantages: space required, execution time required, complications for cells connected circularly.ITCS332 by Dr. Abdel Fattah Salman 6-65
  66. 66. Garbage Collection• GC is a process by which dynamically allocated storage is reclaimed during the execution of a program.• Usually refers to automatic periodic storage reclamation by the garbage collector (part of the run-time system), as opposed to explicit code to free specific blocks of memory.• Usually triggered during memory allocation when available free memory falls below a threshold. Normal execution is suspended and GC is run.• The run-time system allocates storage cells as requested and disconnects pointers from cells as necessary; garbage collection then begins – Every heap cell has an extra bit used by collection algorithm – All cells initially set to garbage – All pointers traced into heap, and reachable cells marked as not garbage – All garbage cells returned to list of available cells – Disadvantages: when you need it most, it works worst (takes most time when program needs most of cells in heap)• Major GC algorithms: – Mark and sweep – Copying – Incremental garbage collection algorithms.ITCS332 by Dr. Abdel Fattah Salman 6-66
  67. 67. Marking Algorithm.ITCS332 by Dr. Abdel Fattah Salman 6-67
  68. 68. Variable-Size Cells• All the difficulties of single-size cells plus more• Required by most programming languages• If garbage collection is used, additional problems occur – The initial setting of the indicators of all cells in the heap is difficult – The marking process in nontrivial – Maintaining the list of available space is another source of overhead.ITCS332 by Dr. Abdel Fattah Salman 6-68
  69. 69. Summary• The data types of a language are a large part of what determines that language’s style and usefulness• The primitive data types of most imperative languages include numeric, character, and Boolean types• The user-defined enumeration and subrange types are convenient and add to the readability and reliability of programs• Arrays and records are included in most languages• Pointers are used for addressing flexibility and to control dynamic storage management.ITCS332 by Dr. Abdel Fattah Salman 6-69