DBMS Unit IV and V Material

CS8492 – DATABASE
MANAGEMENT SYSTEMS
UNIT IV - IMPLEMENTATION
TECHNIQUES

OUTLINE
 RAID
 File Organization –Organization of Records in Files
 Indexing and Hashing
 Ordered Indices
 B+ tree Index Files – B tree Index Files
 Static Hashing –Dynamic Hashing
 Query Processing Overview
 Algorithms for SELECT and JOIN operations
 Query optimization using Heuristics and Cost Estimation
2
PreparedbyR.Arthy,AP/IT,KCET

RAID
REDUNDANT ARRAYS OF INDEPENDENT
DISKS

CLASSIFICATION OF PHYSICAL STORAGE
MEDIA
 Can differentiate storage into:
 volatile storage: loses contents when power is switched off
 non-volatile storage:
 Contents persist even when power is switched off.
 Includes secondary and tertiary storage, as well as batter-backed up
main-memory.
 Factors affecting choice of storage media include
 Speed with which data can be accessed
 Cost per unit of data
 Reliability
4

STORAGE HIERARCHY
5

[CONTD…]
 primary storage: Fastest media but volatile (cache, main
memory).
 secondary storage: next level in hierarchy, non-volatile,
moderately fast access time
 Also called on-line storage
 E.g., flash memory, magnetic disks
 tertiary storage: lowest level in hierarchy, non-volatile, slow
access time
 also called off-line storage and used for archival storage
 e.g., magnetic tape, optical storage
 Magnetic tape
 Sequential access, 1 to 12 TB capacity
 A few drives with many tapes
 Juke boxes with petabytes (1000’s of TB) of storage
6

RAID
 RAID: Redundant Arrays of Independent Disks
 Disk organization techniques that manage a large numbers of
disks, providing a view of a single disk of
 high capacity and high speed by using multiple disks in parallel,
 high reliability by storing data redundantly, so that data can be
recovered even if a disk fails
 The chance that some disk out of a set of N disks will fail
is much higher than the chance that a specific single disk
will fail.
 E.g., a system with 100 disks, each with MTTF of 100,000
hours (approx. 11 years), will have a system MTTF of 1000
hours (approx. 41 days)
 Techniques for using redundancy to avoid data loss are critical
with large numbers of disks
7

IMPROVEMENT OF RELIABILITY VIA REDUNDANCY
 Redundancy – store extra information that can be used to
rebuild information lost in a disk failure
 E.g., Mirroring (or shadowing)
 Duplicate every disk. Logical disk consists of two physical disks.
 Every write is carried out on both disks
 Reads can take place from either disk
 If one disk in a pair fails, data still available in the other
 Data loss would occur only if a disk fails, and its mirror disk also fails
before the system is repaired
 Probability of combined event is very small
 Except for dependent failure modes such as fire or building collapse or
electrical power surges
 Mean time to data loss depends on mean time to failure,
and mean time to repair
 E.g., MTTF of 100,000 hours, mean time to repair of 10 hours gives
mean time to data loss of 500*106 hours (or 57,000 years) for a
mirrored pair of disks (ignoring dependent failure modes)
8

IMPROVEMENT IN PERFORMANCE VIA PARALLELISM
 Two main goals of parallelism in a disk system:
1. Load balance multiple small accesses to increase throughput
2. Parallelize large accesses to reduce response time.
 Improve transfer rate by striping data across multiple disks.
 Bit-level striping – split the bits of each byte across multiple
disks
 In an array of eight disks, write bit i of each byte to disk i.
 Each access can read data at eight times the rate of a single disk.
 But seek/access time worse than for a single disk
 Bit level striping is not used much any more
 Block-level striping – with n disks, block i of a file goes to
disk (i mod n) + 1
 Requests for different blocks can run in parallel if the blocks reside on
different disks
 A request for a long sequence of blocks can utilize all disks in parallel 9

RAID LEVELS
 Schemes to provide redundancy at lower cost by using disk
striping combined with parity bits
 Different RAID organizations, or RAID levels, have differing cost,
performance and reliability characteristics
 RAID Level 0: Block striping; non-redundant.
 Used in high-performance applications where data loss is not critical.
 RAID Level 1: Mirrored disks with block striping
 Offers best write performance.
 Popular for applications such as storing log files in a database system.
10

[CONTD…]
 RAID Level 2: Memory-Style Error-Correcting-Codes
(ECC) with bit striping.
 RAID Level 3: Bit-Interleaved Parity
 a single parity bit is enough for error correction, not just
detection, since we know which disk has failed
 When writing data, corresponding parity bits must also be computed
and written to a parity bit disk
 To recover data in a damaged disk, compute XOR of bits from other
disks (including parity bit disk)
11

[CONTD…]
 RAID Level 3 (Cont.)
 Faster data transfer than with a single disk, but fewer I/Os per
second since every disk has to participate in every I/O.
 Subsumes Level 2 (provides all its benefits, at lower cost).
 RAID Level 4: Block-Interleaved Parity; uses block-level
striping, and keeps a parity block on a separate disk for
corresponding blocks from N other disks.
 When writing data block, corresponding block of parity bits must
also be computed and written to parity disk
 To find value of a damaged block, compute XOR of bits from
corresponding blocks (including parity block) from other disks.
12

[CONTD…]
 Provides higher I/O rates for independent block reads than
Level 3
 block read goes to a single disk, so blocks stored on different disks
can be read in parallel
 Provides high transfer rates for reads of multiple blocks than
no-striping
 Before writing a block, parity data must be computed
 Can be done by using old parity block, old value of current block and
new value of current block (2 block reads + 2 block writes)
 Or by recomputing the parity value using the new values of blocks
corresponding to the parity block
 More efficient for writing large amounts of data sequentially
 Parity block becomes a bottleneck for independent block
writes since every block write also writes to parity disk 13

[CONTD…]
 RAID Level 5: Block-Interleaved Distributed Parity;
partitions data and parity among all N + 1 disks, rather than
storing data in N disks and parity in 1 disk.
 E.g., with 5 disks, parity block for nth set of blocks is stored on disk
(n mod 5) + 1, with the data blocks stored on the other 4 disks.
14

[CONTD…]
 Block writes occur in parallel if the blocks and their parity blocks are
on different disks.
 RAID Level 6: P+Q Redundancy scheme; similar to Level 5,
but stores two error correction blocks (P, Q) instead of single
parity block to guard against multiple disk failures.
 Better reliability than Level 5 at a higher cost
 Becoming more important as storage sizes increase
15

CHOICE OF RAID LEVEL
 Factors in choosing RAID level
 Monetary cost
 Performance: Number of I/O operations per second, and bandwidth
during normal operation
 Performance during failure
 Performance during rebuild of failed disk
 Including time taken to rebuild failed disk
 RAID 0 is used only when data safety is not important
 E.g., data can be recovered quickly from other sources
 Level 2 and 4 never used since they are subsumed by 3 and 5
 Level 3 is not used anymore since bit-striping forces single
block reads to access all disks, wasting disk arm movement,
which block striping (level 5) avoids
 Level 6 is rarely used since levels 1 and 5 offer adequate safety
for most applications 16

[CONTD…]
 Level 1 provides much better write performance than level 5
 Level 5 requires at least 2 block reads and 2 block writes to write a
single block, whereas Level 1 only requires 2 block writes
 Level 1 had higher storage cost than level 5
 Level 5 is preferred for applications where writes are sequential
and large (many blocks), and need large amounts of data storage
 RAID 1 is preferred for applications with many random/small
updates
 Level 6 gives better data protection than RAID 5 since it can
tolerate two disk (or disk block) failures
 Increasing in importance since latent block failures on one disk,
coupled with a failure of another disk can result in data loss with
RAID 1 and RAID 5.
17

HARDWARE ISSUES
 Software RAID: RAID implementations done entirely in
software, with no special hardware support
 Hardware RAID: RAID implementations with special
hardware
 Use non-volatile RAM to record writes that are being executed
 Beware: power failure during write can result in corrupted
disk
 E.g., failure after writing one block but before writing the second in a
mirrored system
 Such corrupted data must be detected when power is restored
 Recovery from corruption is similar to recovery from failed disk
 NV-RAM helps to efficiently detected potentially corrupted blocks
 Otherwise all blocks of disk must be read and compared with
mirror/parity block
18

[CONTD…]
 Latent failures: data successfully written earlier gets damaged
 can result in data loss even if only one disk fails
 Data scrubbing:
 continually scan for latent failures, and recover from copy/parity
 Hot swapping: replacement of disk while system is running,
without power down
 Supported by some hardware RAID systems,
 reduces time to recovery, and improves availability greatly
 Many systems maintain spare disks which are kept online, and
used as replacements for failed disks immediately on detection
of failure
 Reduces time to recovery greatly
 Many hardware RAID systems ensure that a single point of
failure will not stop the functioning of the system by using
 Redundant power supplies with battery backup
 Multiple controllers and multiple interconnections to guard against
controller/interconnection failures 19

OPTIMIZATION OF DISK-BLOCK ACCESS
 Buffering: in-memory buffer to cache disk blocks
 Read-ahead: Read extra blocks from a track in
anticipation that they will be requested soon
 Disk-arm-scheduling algorithms re-order block requests
so that disk arm movement is minimized
 elevator algorithm
20

FILE ORGANIZATION –ORGANIZATION OF
RECORDS IN FILES

INTRODUCTION
 The database is stored as a collection of files.
 Each file is a sequence of records.
 A record is a sequence of fields.
 One approach:
 assume record size is fixed
 each file has records of one particular type only
 different files are used for different relations
 This case is easiest to implement; will consider variable
length records later.
22

FIXED LENGTH RECORD
 Simple approach:
 Store record i starting from byte n ∗ (i − 1), where n is the
size of each record.
 Record access is simple but records may cross blocks.
 Deletion of record i — alternatives:
 move records i + 1,...,n to i, . . . , n − 1
 move record n to i
 Link all free records on a free list
23

EXAMPLE
 type instructor = record
ID varchar (5);
name varchar(20);
dept name varchar (20);
salary numeric (8,2);
End
 instructor record is 53 bytes long.
 Two problems:
 Unless the block size happens to be a multiple of 53 (which is
unlikely), some records will cross block boundaries.
 It is difficult to delete a record from this structure. 24

[CONTD…]
INSERTING RECORD
25

[CONTD…]
DELETING RECORD
26

FILE HEADER AND FREE LIST
 Store the address of the first record whose contents are
deleted in the file header.
 Use this first record to store the address of the second
available record, and so on.
 Can think of these stored addresses as pointers since they
“point” to the location of a record.
27

[CONTD…]
 More space efficient representation: reuse space for
normal attributes of free records to store pointers. (No
pointers stored in in-use records.)
 Dangling pointers occur if we move or delete a record to
which another record contains a pointer; that pointer no
longer points to the desired record.
 Avoid moving or deleting records that are pointed to by
other records; such records are pinned.
28

VARIABLE LENGTH RECORD
 Variable-length records arise in database systems in
several ways:
 Storage of multiple record types in a file.
 Record types that allow variable lengths for one or more
fields.
 Record types that allow repeating fields (used in some older
data models).
 Byte string representation
 Attach an end-of-record (┴) control character to the end of
each record
 Difficulty with deletion
 Difficulty with growth 29

SLOTTED PAGE STRUCTURE
 Header contains:
 number of record entries
 end of free space in the block
 location and size of each record
 Records can be moved around within a page to keep them
contiguous with no empty space between them; entry in the header
must then be updated.
 Pointers should not point directly to record — instead they should
point to the entry for the record in header.
30

ORGANIZATION OF RECORDS IN
FILES
 Heap – a record can be placed anywhere in the file where
there is space
 Sequential – store records in sequential order, based on
the value of the search key of each record
 Hashing – a hash function is computed on some attribute
of each record; the result specifies in which block of the
file the record should be placed
 Clustering – records of several different relations can be
stored in the same file; related records are stored on the
same block
31

SEQUENTIAL FILE ORGANIZATION
 Suitable for applications that require sequential
processing of the entire file
 The records in the file are ordered by a search-key
32

[CONTD…]
 Deletion – use pointer chains
 Insertion – must locate the position in the file where the
record is to be inserted
 if there is free space insert there
 if no free space, insert the record in an overflow block
 In either case, pointer chain must be updated
 Need to reorganize the file from time to time to restore
sequential order
33

CLUSTERING FILE ORGANIZATION
 Simple file structure stores each relation in a separate file
 Can instead store several relations in one file using a
clustering file organization
 E.g., clustering organization of department and employee:
34

INTRODUCTION
 Indexing mechanisms used to speed up access to desired
data.
 E.g., author catalog in library
 Search Key - attribute or set of attributes used to look up
records in a file.
 An index file consists of records (called index entries) of
the form
 Index files are typically much smaller than the original
file
 Two basic kinds of indices:
 Ordered indices: search keys are stored in some sorted order
 Hash indices: search keys are distributed uniformly across
“buckets” using a “hash function”.
36

INDEX EVALUATION METRICS
 Access types: The types of access that are supported
efficiently.
 Access time: The time it takes to find a particular data
item, or set of items, using the technique in question.
 Insertion time: The time it takes to insert a newdata item.
 Deletion time: The time it takes to delete a data item.
 Space overhead: The additional space occupied by an
index structure.
37

INTRODUCTION
 In an ordered index, index entries are stored sorted on the
search key value.
 E.g., author catalog in library.
 Primary index: in a sequentially ordered file, the index whose
search key specifies the sequential order of the file.
 Also called clustering index
 The search key of a primary index is usually but not necessarily the
primary key.
 Secondary index: an index whose search key specifies an
order different from the sequential order of the file. Also
called non-clustering index.
 Index-sequential file:ordered sequential file with a primary
index. 39

DENSE INDEX
40

[CONTD…]
41

SPARSE INDEX
42

MULTILEVEL
INDEXING
43

INDEX UPDATE
Insertion
 First, the system performs a lookup using the search-key value
that appears in the record to be inserted. The actions the
system takes next depend on whether the index is dense or
sparse:
 Dense indices:
1. If the search-key value does not appear in the index, the
system inserts an index entry with the search-key value in the
index at the appropriate position.
2. Otherwise the following actions are taken:
 If the index entry stores pointers to all records with the same search
key value, the system adds a pointer to the new record in the index
entry.
 Otherwise, the index entry stores a pointer to only the first record
with the search-key value. The system then places the record being
inserted after the other records with the same search-key values. 44

[CONTD…] DELETION
 Dense indices:
1. If the deleted record was the only record with its
particular search-key value, then the system deletes the
corresponding index entry from the index.
2. Otherwise the following actions are taken:
 If the index entry stores pointers to all records with the same
search key value, the system deletes the pointer to the deleted
record from the index entry.
 Otherwise, the index entry stores a pointer to only the first
record with the search-key value. In this case, if the deleted
record was the first record with the search-key value, the
system updates the index entry to point to the next record. 45

[CONTD…]
 Sparse indices:
1. If the index does not contain an index entry with the
search-key value of the deleted record, nothing needs to
be done to the index.
2. Otherwise the system takes the following actions:
 If the deleted record was the only record with its search key,
the system replaces the corresponding index record with an
index record for the next search-key value (in search-key
order). If the next search-key value already has an index
entry, the entry is deleted instead of being replaced.
 Otherwise, if the index entry for the search-key value points
to the record being deleted, the system updates the index
entry to point to the next record with the same search-key
value. 46

ALGORITHMS FOR SELECT
AND JOIN OPERATIONS

SELECT OPERATIONS
 File scan – search algorithms that locate and retrieve
records that fulfill a selection condition.
 Algorithm A1 (linear search). Scan each file block and
test all records to see whether they satisfy the selection
condition.
 Cost estimate = br block transfers + 1 seek
 br denotes number of blocks containing records from relation r
 If selection is on a key attribute, can stop on finding record
 cost = (br /2) block transfers + 1 seek
 Linear search can be applied regardless of
 selection condition or
 ordering of records in the file, or
 availability of indices
48

[CONTD…]
 A2 (binary search). Applicable if selection is an
equality comparison on the attribute on which file is
ordered.
 Assume that the blocks of a relation are stored contiguously
 Cost estimate (number of disk blocks to be scanned):
 cost of locating the first tuple by a binary search on the blocks
 log2(br) * (tT + tS)
 If there are multiple records satisfying selection
 Add transfer cost of the number of blocks containing records that
satisfy selection condition
49

[CONTD…]
 Index scan – search algorithms that use an index
 selection condition must be on search-key of index.
 A3 (primary index on candidate key, equality). Retrieve a
single record that satisfies the corresponding equality
condition
 Cost = (hi + 1) * (tT + tS)
 A4 (primary index on nonkey, equality) Retrieve multiple
records.
 Records will be on consecutive blocks
 Let b = number of blocks containing matching records
 Cost = hi * (tT + tS) + tS + tT * b
 A5 (equality on search-key of secondary index).
 Retrieve a single record if the search-key is a candidate key
 Cost = (hi + 1) * (tT + tS)
 Retrieve multiple records if search-key is not a candidate key
 each of n matching records may be on a different block
 Cost = (hi + n) * (tT + tS)
 Can be very expensive!
50

[CONTD…]
 Can implement selections of the form AV (r) or A  V(r) by
using
 a linear file scan or binary search,
 or by using indices in the following ways:
 A6 (primary index, comparison). (Relation is sorted on A)
 For A  V(r) use index to find first tuple  v and scan relation
sequentially from there
 For AV (r) just scan relation sequentially till first tuple > v; do not use
index
 A7 (secondary index, comparison).
 For A  V(r) use index to find first index entry  v and scan index
sequentially from there, to find pointers to records.
 For AV (r) just scan leaf pages of index finding pointers to records, till
first entry > v
 In either case, retrieve records that are pointed to
 requires an I/O for each record
 Linear file scan may be cheaper 51

[CONTD…]
 Conjunction: 1 2. . . n(r)
 A8 (conjunctive selection using one index).
 Select a combination of i and algorithms A1 through A7 that results
in the least cost for i (r).
 Test other conditions on tuple after fetching it into memory buffer.
 A9 (conjunctive selection using multiple-key index).
 Use appropriate composite (multiple-key) index if available.
 A10 (conjunctive selection by intersection of identifiers).
 Requires indices with record pointers.
 Use corresponding index for each condition, and take intersection of
all the obtained sets of record pointers.
 Then fetch records from file
 If some conditions do not have appropriate indices, apply test in
memory. 52

[CONTD…]
 Disjunction:1 2 . . . n (r).
 A11 (disjunctive selection by union of identifiers).
 Applicable if all conditions have available indices.
 Otherwise use linear scan.
 Use corresponding index for each condition, and take union
of all the obtained sets of record pointers.
 Then fetch records from file
 Negation: (r)
 Use linear scan on file
 If very few records satisfy , and an index is applicable to 
 Find satisfying records using index and fetch from file
53

JOIN OPERATIONS
 Several different algorithms to implement joins
 Nested-loop join
 Block nested-loop join
 Indexed nested-loop join
 Merge-join
 Hash-join
 Choice based on cost estimate
 Examples use the following information
 Number of records of customer: 10,000 depositor: 5000
 Number of blocks of customer: 400 depositor: 100
54

NESTED – LOOP JOIN
 To compute the theta join r  s
for each tuple tr in r do begin
for each tuple ts in s do begin
test pair (tr,ts) to see if they satisfy the join condition 
if they do, add tr • ts to the result.
end
end
 r is called the outer relation and s the inner relation of the
join.
 Requires no indices and can be used with any kind of join
condition.
 Expensive since it examines every pair of tuples in the two
relations. 55

[CONTD…]
 In the worst case, if there is enough memory only to hold one block of each relation,
the estimated cost is
nr  bs + br
block transfers, plus
nr + br
seeks
 If the smaller relation fits entirely in memory, use that as the inner relation.
 Reduces cost to br + bs block transfers and 2 seeks
 Assuming worst case memory availability cost estimate is
 with depositor as outer relation:
 5000  400 + 100 = 2,000,100 block transfers,
 5000 + 100 = 5100 seeks
 with customer as the outer relation
 10000  100 + 400 = 1,000,400 block transfers and 10,400 seeks
 If smaller relation (depositor) fits entirely in memory, the cost estimate will be 500
block transfers.
 Block nested-loops algorithm is preferable. 56

BLOCK NESTED - LOOP JOIN
 Variant of nested-loop join in which every block of inner
relation is paired with every block of outer relation.
for each block Br of r do begin
for each block Bs of s do begin
for each tuple tr in Br do begin
for each tuple ts in Bs do begin
Check if (tr,ts) satisfy the join condition
if they do, add tr • ts to the result.
end
end
end
end 57

[CONTD…]
 Worst case estimate: br  bs + br block transfers + 2 * br seeks
 Each block in the inner relation s is read once for each block in the outer
relation (instead of once for each tuple in the outer relation
 Best case: br + bs block transfers + 2 seeks.
 Improvements to nested loop and block nested loop algorithms:
 In block nested-loop, use M — 2 disk blocks as blocking unit for outer
relations, where M = memory size in blocks; use remaining two blocks
to buffer inner relation and output
 Cost = br / (M-2)  bs + br block transfers + 2 br / (M-2) seeks
 If equi-join attribute forms a key on inner relation, stop inner loop on
first match
 Scan inner loop forward and backward alternately, to make use of the
blocks remaining in buffer (with LRU replacement)
 Use index on inner relation if available
58

INDEXED NESTED - LOOP JOIN
 Index lookups can replace file scans if
 join is an equi-join or natural join and
 an index is available on the inner relation’s join attribute
 Can construct an index just to compute a join.
 For each tuple tr in the outer relation r, use the index to look up
tuples in s that satisfy the join condition with tuple tr.
 Worst case: buffer has space for only one page of r, and, for each
tuple in r, we perform an index lookup on s.
 Cost of the join: br (tT + tS) + nr  c
 Where c is the cost of traversing index and fetching all matching s
tuples for one tuple or r
 c can be estimated as cost of a single selection on s using the join
condition.
 If indices are available on join attributes of both r and s,
use the relation with fewer tuples as the outer relation.
59

EXAMPLE
 Compute depositor customer, with depositor as the outer relation.
 Let customer have a primary B+-tree index on the join attribute
customer-name, which contains 20 entries in each index node.
 Since customer has 10,000 tuples, the height of the tree is 4, and one
more access is needed to find the actual data
 depositor has 5000 tuples
 Cost of block nested loops join
 400*100 + 100 = 40,100 block transfers + 2 * 100 = 200 seeks
 assuming worst case memory
 may be significantly less with more memory
 Cost of indexed nested loops join
 100 + 5000 * 5 = 25,100 block transfers and seeks.
 CPU cost likely to be less than that for block nested loops join 60

MERGE JOIN
1. Sort both relations on their join
attribute (if not already sorted on
the join attributes).
2. Merge the sorted relations to join
them
1. Join step is similar to the
merge stage of the sort-merge
algorithm.
2. Main difference is handling of
duplicate values in join
attribute — every pair with
same value on join attribute
must be matched
61

[CONTD…]
 Can be used only for equi-joins and natural joins
 Each block needs to be read only once (assuming all tuples for any
given value of the join attributes fit in memory
 Thus the cost of merge join is:
br + bs block transfers + br / bb + bs / bb seeks
 + the cost of sorting if relations are unsorted.
 hybrid merge-join: If one relation is sorted, and the other has a
secondary B+-tree index on the join attribute
 Merge the sorted relation with the leaf entries of the B+-tree .
 Sort the result on the addresses of the unsorted relation’s tuples
 Scan the unsorted relation in physical address order and merge with
previous result, to replace addresses by the actual tuples
 Sequential scan more efficient than random lookup
62

HASH JOIN
 Applicable for equi-joins and natural joins.
 A hash function h is used to partition tuples of both relations
 Intuition: partitions fit in memory
 h maps JoinAttrs values to {0, 1, ..., n}, where JoinAttrs
denotes the common attributes of r and s used in the natural
join.
 r0, r1, . . ., rn denote partitions of r tuples
 Each tuple tr  r is put in partition ri where i = h(tr [JoinAttrs]).
 r0,, r1. . ., rn denotes partitions of s tuples
 Each tuple ts s is put in partition si, where i = h(ts [JoinAttrs]).
 Note: In book, ri is denoted as Hri, si is denoted as Hsi and
n is denoted as nh.
63

[CONTD…]
64
 r tuples in ri need only to be
compared with s tuples in si
Need not be compared with s
tuples in any other partition,
since:
 an r tuple and an s tuple that
satisfy the join condition will
have the same value for the join
attributes.
 If that value is hashed to some
value i, the r tuple has to be in
ri and the s tuple in si.

[CONTD…]
 The hash-join of r and s is computed as follows.
1. Partition the relation s using hashing function h.
1. When partitioning a relation, one block of memory is
reserved as the output buffer for each partition, and one
block for input
2. If extra memory is available, allocate bb blocks as buffer for
input and each output
2.Partition r similarly.
65

[CONTD…]
3. For each partition i:
(a) Load si into memory and build an in-memory hash index on
it using the join attribute.
 This hash index uses a different hash function than the earlier one
h.
(b) Read the tuples in ri from the disk one by one.
 For each tuple tr probe the in-memory hash index to find all
matching tuples ts in si
 For each matching tuple ts in si
 output the concatenation of the attributes of tr and ts
 Relation s is called the build input and
r is called the probe input.
66

[CONTD…]
 The value n and the hash function h is chosen such that each si
should fit in memory.
 Typically n is chosen as bs/M * f where f is a “fudge factor”,
typically around 1.2
 The probe relation partitions si need not fit in memory
 Recursive partitioning required if number of partitions n is greater
than number of pages M of memory.
 instead of partitioning n ways, use M – 1 partitions for s
 Further partition the M – 1 partitions using a different hash
function
 Use same partitioning method on r
 Rarely required: e.g., recursive partitioning not needed for
relations of 1GB or less with memory size of 2MB, with block
size of 4KB.
67

HANDLING OVERFLOW
 Partitioning is said to be skewed if some partitions have
significantly more tuples than some others
 Hash-table overflow occurs in partition si if si does not fit in
memory. Reasons could be
 Many tuples in s with same value for join attributes
 Bad hash function
 Overflow resolution can be done in build phase
 Partition si is further partitioned using different hash function.
 Partition ri must be similarly partitioned.
 Overflow avoidance performs partitioning carefully to avoid
overflows during build phase
 E.g. partition build relation into many partitions, then combine them
 Both approaches fail with large numbers of duplicates
 Fallback option: use block nested loops join on overflowed partitions
68

[CONTD…]
 If recursive partitioning is not required: cost of hash join is
3(br + bs) +4  nh block transfers +
2( br / bb + bs / bb) seeks
 If recursive partitioning required:
 number of passes required for partitioning build relation
s is logM–1(bs) – 1
 best to choose the smaller relation as the build relation.
 Total cost estimate is:
2(br + bs logM–1(bs) – 1 + br + bs block transfers +
2(br / bb + bs / bb) logM–1(bs) – 1 seeks
 If the entire build input can be kept in main memory no partitioning
is required
 Cost estimate goes down to br + bs. 69

EXAMPLE
 Assume that memory size is 20 blocks
 bdepositor= 100 and bcustomer = 400.
 depositor is to be used as build input. Partition it into
five partitions, each of size 20 blocks. This partitioning
can be done in one pass.
 Similarly, partition customer into five partitions,each of
size 80. This is also done in one pass.
 Therefore total cost, ignoring cost of writing partially
filled blocks:
 3(100 + 400) = 1500 block transfers +
2( 100/3 + 400/3) = 336 seeks 70
customer depositor

HYBRID HASH JOIN
 Useful when memory sized are relatively large, and the build input is
bigger than memory.
 Main feature of hybrid hash join:
Keep the first partition of the build relation in memory.
 E.g. With memory size of 25 blocks, depositor can be partitioned
into five partitions, each of size 20 blocks.
 Division of memory:
 The first partition occupies 20 blocks of memory
 1 block is used for input, and 1 block each for buffering the other 4 partitions.
 customer is similarly partitioned into five partitions each of size 80
 the first is used right away for probing, instead of being written out
 Cost of 3(80 + 320) + 20 +80 = 1300 block transfers for
hybrid hash join, instead of 1500 with plain hash-join.
 Hybrid hash-join most useful if M >>
71
sb

QUERY OPTIMIZATION USING
HEURISTICS AND COST
ESTIMATION

INTRODUCTION
 Alternative ways of evaluating a given query
 Equivalent expressions
 Different algorithms for each operation
73

[CONTD…]
 An evaluation plan defines exactly what algorithm is
used for each operation, and how the execution of the
operations is coordinated.
74

[CONTD…]
 Cost difference between evaluation plans for a query can be
enormous
 E.g. seconds vs. days in some cases
 Steps in cost-based query optimization
1. Generate logically equivalent expressions using equivalence rules
2. Annotate resultant expressions to get alternative query plans
3. Choose the cheapest plan based on estimated cost
 Estimation of plan cost based on:
 Statistical information about relations. Examples:
 number of tuples, number of distinct values for an attribute
 Statistics estimation for intermediate results
 to compute cost of complex expressions
 Cost formulae for algorithms, computed using statistics 75

GENERATING EQUIVALENT EXPRESSIONS –
TRANSFORMATION OF RELATIONAL EXPRESSIONS
 Two relational algebra expressions are said to be equivalent if the
two expressions generate the same set of tuples on every legal
database instance
 Note: order of tuples is irrelevant
 In SQL, inputs and outputs are multisets of tuples
 Two expressions in the multiset version of the relational algebra
are said to be equivalent if the two expressions generate the same
multiset of tuples on every legal database instance.
 An equivalence rule says that expressions of two forms are
equivalent
 Can replace expression of first form by second, or vice versa
76

EQUIVALENCE RULE
1. Conjunctive selection operations can be deconstructed into a
sequence of individual selections.
2. Selection operations are commutative.
3. Only the last in a sequence of projection operations is
needed, the others can be omitted.
4. Selections can be combined with Cartesian products and
theta joins.
a. (E1 X E2) = E1  E2
b. 1(E1 2 E2) = E1 1 2 E2 77
))(()( 2121
EE   
))(())(( 1221
EE   
)())))(((( 121
EE LLnLL  

[CONTD…]
5.Theta-join operations (and natural joins) are
commutative.
E1  E2 = E2  E1
6.(a) Natural join operations are associative:
(E1 E2) E3 = E1 (E2 E3)
(b) Theta joins are associative in the following manner:
(E1 1 E2) 2 3 E3 = E1 1 3 (E2 2 E3)
where 2 involves attributes from only E2 and E3.
78

[CONTD…]
79

[CONTD…]
7.The selection operation distributes over the theta join
operation under the following two conditions:
(a) When all the attributes in 0 involve only the
attributes of one
of the expressions (E1) being joined.
0E1  E2) = (0(E1))  E2
(b) When  1 involves only the attributes of E1 and 2
involves
only the attributes of E2.
1 E1  E2) = (1(E1))  ( (E2)) 80

[CONTD…]
8.The projection operation distributes over the theta join
operation as follows:
(a) if  involves only attributes from L1  L2:
(b) Consider a join E1  E2.
 Let L1 and L2 be sets of attributes from E1 and E2,
respectively.
 Let L3 be attributes of E1 that are involved in join condition ,
but are not in L1  L2, and
 let L4 be attributes of E2 that are involved in join condition ,
but are not in L1  L2. 81
))(())(()( 2121 2121
EEEE LLLL   
)))(())((()( 2121 42312121
EEEE LLLLLLLL   

[CONTD…]
9. The set operations union and intersection are commutative
E1  E2 = E2  E1
E1  E2 = E2  E1
 (set difference is not commutative).
10. Set union and intersection are associative.
(E1  E2)  E3 = E1  (E2  E3)
(E1  E2)  E3 = E1  (E2  E3)
11. The selection operation distributes over ,  and –.
 (E1 – E2) =  (E1) – (E2)
and similarly for  and  in place of –
Also:  (E1 – E2) = (E1) – E2
and similarly for  in place of –, but not for 
12. The projection operation distributes over union
L(E1  E2) = (L(E1))  (L(E2))
82

EXAMPLE
 Query: Find the names of all customers with an account at a
Brooklyn branch whose account balance is over $1000.
customer_name((branch_city = “Brooklyn”  balance > 1000
(branch (account depositor)))
 Transformation using join associatively (Rule 6a):
customer_name((branch_city = “Brooklyn”  balance > 1000
(branch account)) depositor)
 Second form provides an opportunity to apply the “perform
selections early” rule, resulting in the subexpression
branch_city = “Brooklyn” (branch)  balance > 1000
(account)
 Thus a sequence of transformations can be useful
83

[CONTD…]
84

TRANSFORMATION EXAMPLE: PUSHING PROJECTIONS
 When we compute
(branch_city = “Brooklyn” (branch) account )
we obtain a relation whose schema is:
(branch_name, branch_city, assets, account_number, balance)
 Push projections using equivalence rules 8a and 8b; eliminate
unneeded attributes from intermediate results to get:
customer_name ((account_number ( (branch_city = “Brooklyn” (branch) account
)) depositor )
 Performing the projection as early as possible reduces the size of the
relation to be joined.
customer_name((branch_city = “Brooklyn” (branch) account) depositor)
85

JOIN ORDERING EXAMPLE
 For all relations r1, r2, and r3,
(r1 r2) r3 = r1 (r2 r3 )
(Join Associativity)
 If r2 r3 is quite large and r1 r2 is small, we choose
(r1 r2) r3
so that we compute and store a smaller temporary relation.
86

JOIN ORDERING EXAMPLE (CONT.)
 Consider the expression
customer_name ((branch_city = “Brooklyn” (branch))
(account depositor))
 Could compute account depositor first, and join
result with
branch_city = “Brooklyn” (branch)
but account depositor is likely to be a large relation.
 Only a small fraction of the bank’s customers are likely to
have accounts in branches located in Brooklyn
 it is better to compute
branch_city = “Brooklyn” (branch) account
first.
87

ENUMERATION OF EQUIVALENT EXPRESSIONS
 Query optimizers use equivalence rules to systematically
generate expressions equivalent to the given expression
 Can generate all equivalent expressions as follows:
 Repeat
 apply all applicable equivalence rules on every equivalent expression
found so far
 add newly generated expressions to the set of equivalent expressions
Until no new equivalent expressions are generated above
 The above approach is very expensive in space and time
 Two approaches
 Optimized plan generation based on transformation rules
 Special case approach for queries with only selections, projections and
joins
88

GENERATING EQUIVALENT EXPRESSIONS – IMPLEMENTING
TRANSFORMATION BASED OPTIMIZATION
 Space requirements reduced by sharing common sub-expressions:
 when E1 is generated from E2 by an equivalence rule, usually only the top
level of the two are different, subtrees below are the same and can be shared
using pointers
 E.g. when applying join commutativity
 Same sub-expression may get generated multiple times
 Detect duplicate sub-expressions and share one copy
 Time requirements are reduced by not generating all expressions
 Dynamic programming
 We will study only the special case of dynamic programming for join order
optimization
E1 E2
89

COST ESTIMATION
 Cost of each operator computer as described in Chapter 13
 Need statistics of input relations
 E.g. number of tuples, sizes of tuples
 Inputs can be results of sub-expressions
 Need to estimate statistics of expression results
 To do so, we require additional statistics
 E.g. number of distinct values for an attribute
 More on cost estimation later
90

CHOICE OF EVALUATION PLANS
 Must consider the interaction of evaluation techniques when
choosing evaluation plans
 choosing the cheapest algorithm for each operation independently
may not yield best overall algorithm. E.g.
 merge-join may be costlier than hash-join, but may provide a sorted
output which reduces the cost for an outer level aggregation.
 nested-loop join may provide opportunity for pipelining
 Practical query optimizers incorporate elements of the
following two broad approaches:
1. Search all the plans and choose the best plan in a
cost-based fashion.
2. Uses heuristics to choose a plan.
91

COST-BASED OPTIMIZATION
 Consider finding the best join-order for r1 r2 . . . rn.
 There are (2(n – 1))!/(n – 1)! different join orders for above
expression. With n = 7, the number is 665280, with n = 10,
the number is greater than 176 billion!
 No need to generate all the join orders. Using dynamic
programming, the least-cost join order for any subset of
{r1, r2, . . . rn} is computed only once and stored for future
use.
92

DYNAMIC PROGRAMMING IN OPTIMIZATION
 To find best join tree for a set of n relations:
 To find best plan for a set S of n relations, consider all possible
plans of the form: S1 (S – S1) where S1 is any non-empty
subset of S.
 Recursively compute costs for joining subsets of S to find the
cost of each plan. Choose the cheapest of the 2n – 1
alternatives.
 Base case for recursion: single relation access plan
 Apply all selections on Ri using best choice of indices on Ri
 When plan for any subset is computed, store it and reuse it
when it is required again, instead of recomputing it
 Dynamic programming
93

JOIN ORDER OPTIMIZATION ALGORITHM
procedure findbestplan(S)
if (bestplan[S].cost  )
return bestplan[S]
// else bestplan[S] has not been computed earlier, compute it now
if (S contains only 1 relation)
set bestplan[S].plan and bestplan[S].cost based on the best way
of accessing S /* Using selections on S and indices on S */
else for each non-empty subset S1 of S such that S1  S
P1= findbestplan(S1)
P2= findbestplan(S - S1)
A = best algorithm for joining results of P1 and P2
cost = P1.cost + P2.cost + cost of A
if cost < bestplan[S].cost
bestplan[S].cost = cost
bestplan[S].plan = “execute P1.plan; execute P2.plan;
join results of P1 and P2 using A”
return bestplan[S]
94

LEFT DEEP JOIN TREES
 In left-deep join trees, the right-hand-side input for
each join is a relation, not the result of an
intermediate join.
95

COST OF OPTIMIZATION
 With dynamic programming time complexity of optimization with bushy
trees is O(3n).
 With n = 10, this number is 59000 instead of 176 billion!
 Space complexity is O(2n)
 To find best left-deep join tree for a set of n relations:
 Consider n alternatives with one relation as right-hand side input and the
other relations as left-hand side input.
 Modify optimization algorithm:
 Replace “for each non-empty subset S1 of S such that S1  S”
 By: for each relation r in S
let S1 = S – r .
 If only left-deep trees are considered, time complexity of finding best
join order is O(n 2n)
 Space complexity remains at O(2n)
 Cost-based optimization is expensive, but worthwhile for queries on
large datasets (typical queries have small n, generally < 10)
96

INTERESTING SORT ORDERS
 Consider the expression (r1 r2) r3 (with A as common attribute)
 An interesting sort order is a particular sort order of tuples that could
be useful for a later operation
 Using merge-join to compute r1 r2 may be costlier than hash join but
generates result sorted on A
 Which in turn may make merge-join with r3 cheaper, which may reduce cost
of join with r3 and minimizing overall cost
 Sort order may also be useful for order by and for grouping
 Not sufficient to find the best join order for each subset of the set of n
given relations
 must find the best join order for each subset, for each interesting sort order
 Simple extension of earlier dynamic programming algorithms
 Usually, number of interesting orders is quite small and doesn’t affect
time/space complexity significantly
97

HEURISTIC OPTIMIZATION
 Cost-based optimization is expensive, even with dynamic programming.
 Systems may use heuristics to reduce the number of choices that must
be made in a cost-based fashion.
 Heuristic optimization transforms the query-tree by using a set of rules
that typically (but not in all cases) improve execution performance:
 Perform selection early (reduces the number of tuples)
 Perform projection early (reduces the number of attributes)
 Perform most restrictive selection and join operations (i.e. with smallest
result size) before other similar operations.
 Some systems use only heuristics, others combine heuristics with partial
cost-based optimization.
98

STRUCTURE OF QUERY OPTIMIZERS
 Many optimizers considers only left-deep join orders.
 Plus heuristics to push selections and projections down the query
tree
 Reduces optimization complexity and generates plans amenable
to pipelined evaluation.
 Heuristic optimization used in some versions of Oracle:
 Repeatedly pick “best” relation to join next
 Starting from each of n starting points. Pick best among these
 Intricacies of SQL complicate query optimization
 E.g. nested subqueries
99

[CONTD…]
 Some query optimizers integrate heuristic selection and the generation
of alternative access plans.
 Frequently used approach
 heuristic rewriting of nested block structure and aggregation
 followed by cost-based join-order optimization for each block
 Some optimizers (e.g. SQL Server) apply transformations to entire
query and do not depend on block structure
 Even with the use of heuristics, cost-based query optimization imposes
a substantial overhead.
 But is worth it for expensive queries
 Optimizers often use simple heuristics for very cheap queries, and
perform exhaustive enumeration for more expensive queries
100

STATISTICS FOR COST ESTIMATION - STATISTICAL
INFORMATION FOR COST ESTIMATION
 nr: number of tuples in a relation r.
 br: number of blocks containing tuples of r.
 lr: size of a tuple of r.
 fr: blocking factor of r — i.e., the number of tuples of r that fit into
one block.
 V(A, r): number of distinct values that appear in r for attribute A;
same as the size of A(r).
 If tuples of r are stored together physically in a file, then:











rf
rn
rb
101

STATISTICS FOR COST ESTIMATION -
HISTOGRAMS
 Histogram on attribute age of relation person
 Equi-width histograms
 Equi-depth histograms
102

STATISTICS FOR COST ESTIMATION - SELECTION SIZE
ESTIMATION
 A=v(r)
 nr / V(A,r) : number of records that will satisfy the selection
 Equality condition on a key attribute: size estimate = 1
 AV(r) (case of A  V(r) is symmetric)
 Let c denote the estimated number of tuples satisfying the
condition.
 If min(A,r) and max(A,r) are available in catalog
 c = 0 if v < min(A,r)
 c =
 If histograms available, can refine above estimate
 In absence of statistical information c is assumed to be nr / 2.
),min(),max(
),min(
.
rArA
rAv
nr


103

STATISTICS FOR COST ESTIMATION - SIZE ESTIMATION
OF COMPLEX SELECTIONS
 The selectivity of a condition i is the probability that a tuple in
the relation r satisfies i .
 If si is the number of satisfying tuples in r, the selectivity of i is
given by si /nr.
 Conjunction: 1 2. . .  n (r). Assuming indepdence, estimate
of tuples in the result is:
 Disjunction:1 2 . . .  n (r). Estimated number of tuples:
 Negation: (r). Estimated number of tuples:
nr – size((r))
n
r
n
r
n
sss
n


...21






 )1(...)1()1(1 21
r
n
rr
r
n
s
n
s
n
s
n
104

STATISTICS FOR COST ESTIMATION - JOIN
OPERATION: RUNNING EXAMPLE
Running example:
depositor customer
Catalog information for join examples:
 ncustomer = 10,000.
 fcustomer = 25, which implies that
bcustomer =10000/25 = 400.
 ndepositor = 5000.
 fdepositor = 50, which implies that
bdepositor = 5000/50 = 100.
 V(customer_name, depositor) = 2500, which implies that , on
average, each customer has two accounts.
 Also assume that customer_name in depositor is a foreign key on
customer.
 V(customer_name, customer) = 10000 (primary key!)
105

STATISTICS FOR COST ESTIMATION - ESTIMATION
OF THE SIZE OF JOINS
 The Cartesian product r x s contains nr .ns tuples; each
tuple occupies sr + ss bytes.
 If R  S = , then r s is the same as r x s.
 If R  S is a key for R, then a tuple of s will join with at
most one tuple from r
 therefore, the number of tuples in r s is no greater than the
number of tuples in s.
 If R  S in S is a foreign key in S referencing R, then the
number of tuples in r s is exactly the same as the
number of tuples in s.
 The case for R  S being a foreign key referencing S is symmetric.
 In the example query depositor customer,
customer_name in depositor is a foreign key of customer
 hence, the result has exactly ndepositor tuples, which is 5000 106

STATISTICS FOR COST ESTIMATION - ESTIMATION OF
THE SIZE OF JOINS (CONT.)
 If R  S = {A} is not a key for R or S.
If we assume that every tuple t in R produces tuples in R S, the
number of tuples in R S is estimated to be:
If the reverse is true, the estimate obtained will be:
The lower of these two estimates is probably the more accurate one.
 Can improve on above if histograms are available
 Use formula similar to above, for each cell of histograms on the
two relations
),( sAV
nn sr 
),( rAV
nn sr 
107

[CONTD…]
 Compute the size estimates for depositor customer
without using information about foreign keys:
 V(customer_name, depositor) = 2500, and
V(customer_name, customer) = 10000
 The two estimates are 5000 * 10000/2500 - 20,000 and 5000
* 10000/10000 = 5000
 We choose the lower estimate, which in this case, is the same
as our earlier computation using foreign keys.
108

STATISTICS FOR COST ESTIMATION - SIZE
ESTIMATION FOR OTHER OPERATIONS
 Projection: estimated size of A(r) = V(A,r)
 Aggregation : estimated size of AgF(r) = V(A,r)
 Set operations
 For unions/intersections of selections on the same relation:
rewrite and use size estimate for selections
 E.g. 1 (r)  2 (r) can be rewritten as 1 2 (r)
 For operations on different relations:
 estimated size of r  s = size of r + size of s.
 estimated size of r  s = minimum size of r and size of s.
 estimated size of r – s = r.
 All the three estimates may be quite inaccurate, but provide upper
bounds on the sizes.
109

[CONTD…]
 Outer join:
 Estimated size of r s = size of r s + size of r
 Case of right outer join is symmetric
 Estimated size of r s = size of r s + size of r + size
of s
110

STATISTICS FOR COST ESTIMATION - ESTIMATION OF
NUMBER OF DISTINCT VALUES
Selections:  (r)
 If  forces A to take a specified value: V(A, (r)) = 1.
 e.g., A = 3
 If  forces A to take on one of a specified set of values:
V(A, (r)) = number of specified values.
 (e.g., (A = 1 V A = 3 V A = 4 )),
 If the selection condition  is of the form A op r
estimated V(A, (r)) = V(A.r) * s
 where s is the selectivity of the selection.
 In all the other cases: use approximate estimate of
min(V(A,r), n (r) )
 More accurate estimate can be got using probability theory,
but this one works fine generally 111

[CONTD…]
Joins: r s
 If all attributes in A are from r
estimated V(A, r s) = min (V(A,r), n r s)
 If A contains attributes A1 from r and A2 from s, then
estimated
V(A,r s) =
min(V(A1,r)*V(A2 – A1,s), V(A1 – A2,r)*V(A2,s),
nr s)
 More accurate estimate can be got using probability theory,
but this one works fine generally
112

[CONTD…]
 Estimation of distinct values are straightforward for
projections.
 They are the same in A (r) as in r.
 The same holds for grouping attributes of aggregation.
 For aggregated values
 For min(A) and max(A), the number of distinct values can be
estimated as min(V(A,r), V(G,r)) where G denotes grouping
attributes
 For other aggregates, assume all values are distinct, and use
V(G,r)
113

OPTIMIZING NESTED SUBQUERIES
 Nested query example:
select customer_name
from borrower
where exists (select *
from depositor
where depositor.customer_name =
borrower.customer_name)
 SQL conceptually treats nested subqueries in the where clause as functions
that take parameters and return a single value or set of values
 Parameters are variables from outer level query that are used in the nested
subquery; such variables are called correlation variables
 Conceptually, nested subquery is executed once for each tuple in the
cross-product generated by the outer level from clause
 Such evaluation is called correlated evaluation
 Note: other conditions in where clause may be used to compute a join (instead
of a cross-product) before executing the nested subquery 114

[CONTD…]
 Correlated evaluation may be quite inefficient since
 a large number of calls may be made to the nested query
 there may be unnecessary random I/O as a result
 SQL optimizers attempt to transform nested subqueries to joins where
possible, enabling use of efficient join techniques
 E.g.: earlier nested query can be rewritten as
from borrower, depositor
where depositor.customer_name = borrower.customer_name
 Note: the two queries generate different numbers of duplicates (why?)
 Borrower can have duplicate customer-names
 Can be modified to handle duplicates correctly as we will see
 In general, it is not possible/straightforward to move the entire nested
subquery from clause into the outer level query from clause
 A temporary relation is created instead, and used in body of outer level query 115

[CONTD…]
In general, SQL queries of the form below can be rewritten as shown
 Rewrite: select …
from L1
where P1 and exists (select *
from L2
where P2)
 To: create table t1 as
select distinct V
from L2
where P2
1
select …
from L1, t1
where P1 and P2
2
 P2
1 contains predicates in P2 that do not involve any correlation
variables
 P2
2 reintroduces predicates involving correlation variables, with
relations renamed appropriately
 V contains all attributes used in predicates with correlation variables
116

[CONTD…]
 In our example, the original nested query would be transformed to
create table t1 as
select distinct customer_name
from depositor
from borrower, t1
where t1.customer_name = borrower.customer_name
 The process of replacing a nested query by a query with a join (possibly
with a temporary relation) is called decorrelation.
 Decorrelation is more complicated when
 the nested subquery uses aggregation, or
 when the result of the nested subquery is used to test for equality, or
 when the condition linking the nested subquery to the other
query is not exists,
 and so on.
117

MATERIALIZED VIEWS
 A materialized view is a view whose contents are computed
and stored.
 Consider the view
create view branch_total_loan(branch_name, total_loan) as
select branch_name, sum(amount)
from loan
group by branch_name
 Materializing the above view would be very useful if the total
loan amount is required frequently
 Saves the effort of finding multiple tuples and adding up
their amounts
118

MATERIALIZED VIEW MAINTENANCE
 The task of keeping a materialized view up-to-date with the
underlying data is known as materialized view maintenance
 Materialized views can be maintained by recomputation on every
update
 A better option is to use incremental view maintenance
 Changes to database relations are used to compute changes to the
materialized view, which is then updated
 View maintenance can be done by
 Manually defining triggers on insert, delete, and update of each relation
in the view definition
 Manually written code to update the view whenever database relations
are updated
 Periodic recomputation (e.g. nightly)
 Above methods are directly supported by many database systems
 Avoids manual effort/correctness issues
119

INCREMENTAL VIEW MAINTENANCE
 The changes (inserts and deletes) to a relation or expressions are
referred to as its differential
 Set of tuples inserted to and deleted from r are denoted ir and dr
 To simplify our description, we only consider inserts and deletes
 We replace updates to a tuple by deletion of the tuple followed
by insertion of the update tuple
 We describe how to compute the change to the result of each
relational operation, given changes to its inputs
 We then outline how to handle relational algebra expressions
120

JOIN OPERATION
 Consider the materialized view v = r s and an update to r
 Let rold and rnew denote the old and new states of relation r
 Consider the case of an insert to r:
 We can write rnew s as (rold  ir) s
 And rewrite the above to (rold s)  (ir s)
 But (rold s) is simply the old value of the materialized view, so
the incremental change to the view is just ir s
 Thus, for inserts vnew = vold (ir s)
 Similarly for deletes vnew = vold – (dr s)
A, 1
B, 2
1, p
2, r
2, s
A, 1, p
B, 2, r
B, 2, s
C,2
C, 2, r
C, 2, s
121

SELECTION AND PROJECTION OPERATIONS
 Selection: Consider a view v = (r).
 vnew = vold (ir)
 vnew = vold - (dr)
 Projection is a more difficult operation
 R = (A,B), and r(R) = { (a,2), (a,3)}
 A(r) has a single tuple (a).
 If we delete the tuple (a,2) from r, we should not delete the tuple (a)
from A(r), but if we then delete (a,3) as well, we should delete the
tuple
 For each tuple in a projection A(r) , we will keep a count of how many
times it was derived
 On insert of a tuple to r, if the resultant tuple is already in A(r) we
increment its count, else we add a new tuple with count = 1
 On delete of a tuple from r, we decrement the count of the
corresponding tuple in A(r)
 if the count becomes 0, we delete the tuple from A(r) 122

AGGREGATION OPERATIONS
 count : v = Agcount(B)
(r).
 When a set of tuples ir is inserted
 For each tuple r in ir, if the corresponding group is already present in v, we
increment its count, else we add a new tuple with count = 1
 When a set of tuples dr is deleted
 for each tuple t in ir.we look for the group t.A in v, and subtract 1 from the count
for the group.
 If the count becomes 0, we delete from v the tuple for the group t.A
 sum: v = Agsum (B)
(r)
 We maintain the sum in a manner similar to count, except we add/subtract the B
value instead of adding/subtracting 1 for the count
 Additionally we maintain the count in order to detect groups with no tuples. Such
groups are deleted from v
 Cannot simply test for sum = 0 (why?)
 To handle the case of avg, we maintain the sum and count
aggregate values separately, and divide at the end 123

[CONTD…]
 min, max: v = Agmin (B) (r).
 Handling insertions on r is straightforward.
 Maintaining the aggregate values min and max on deletions
may be more expensive. We have to look at the other tuples
of r that are in the same group to find the new minimum
124

OTHER OPERATIONS
 Set intersection: v = r  s
 when a tuple is inserted in r we check if it is present in s, and
if so we add it to v.
 If the tuple is deleted from r, we delete it from the
intersection if it is present.
 Updates to s are symmetric
 The other set operations, union and set difference are handled
in a similar fashion.
 Outer joins are handled in much the same way as joins
but with some extra work
 we leave details to you.
125

HANDLING EXPRESSIONS
 To handle an entire expression, we derive expressions for
computing the incremental change to the result of each
sub-expressions, starting from the smallest sub-
expressions.
 E.g. consider E1 E2 where each of E1 and E2 may be a
complex expression
 Suppose the set of tuples to be inserted into E1 is given by D1
 Computed earlier, since smaller sub-expressions are handled first
 Then the set of tuples to be inserted into E1 E2 is given by
D1 E2
 This is just the usual way of maintaining joins
126

QUERY OPTIMIZATION AND MATERIALIZED VIEWS
 Rewriting queries to use materialized views:
 A materialized view v = r s is available
 A user submits a query r s t
 We can rewrite the query as v t
 Whether to do so depends on cost estimates for the two alternative
 Replacing a use of a materialized view by the view definition:
 A materialized view v = r s is available, but without any index
on it
 User submits a query A=10(v).
 Suppose also that s has an index on the common attribute B, and r
has an index on attribute A.
 The best plan for this query may be to replace v by r s, which can
lead to the query plan A=10(r) s
 Query optimizer should be extended to consider all above
alternatives and choose the best overall plan
127

MATERIALIZED VIEW SELECTION
 Materialized view selection: “What is the best set of views to
materialize?”.
 Index selection: “what is the best set of indices to create”
 closely related, to materialized view selection
 but simpler
 Materialized view selection and index selection based on
typical system workload (queries and updates)
 Typical goal: minimize time to execute workload , subject to
constraints on space and time taken for some critical queries/updates
 One of the steps in database tuning
 more on tuning in later chapters
 Commercial database systems provide tools (called “tuning
assistants” or “wizards”) to help the database administrator
choose what indices and materialized views to create
128

CS8492 – DATABASE
MANAGEMENT SYSTEMS
UNIT V - ADVANCED
TOPICS

OUTLINE
 Distributed Databases: Architecture
 Data Storage, Transaction Processing
 Object-based Databases: Object Database Concepts
 Object-Relational features
 ODMG Object Model, ODL, OQL
 XML Databases: XML Hierarchical Model
 DTD, XML Schema
 XQuery
 Information Retrieval: IR Concepts, Retrieval Models,
Queries in IR systems.
130

OUTLINE
 Distributed Database Systems – Introduction
 Distributed Data Storage
 Distributed Transaction
 Commit Protocol
132

I. DISTRIBUTED DATABASE SYSTEM
 A distributed database system consists of loosely coupled sites
that share no physical component
 Database systems that run on each site are independent of each
other
 Transactions may access data at one or more sites
133

TYPES OF DISTRIBUTED DATABASES
 In a homogeneous distributed database
 All sites have identical software
 Are aware of each other and agree to cooperate in processing
user requests.
 Each site surrenders part of its autonomy in terms of right to
change schemas or software
 Appears to user as a single system
 In a heterogeneous distributed database
 Different sites may use different schemas and software
 Difference in schema is a major problem for query processing
 Difference in software is a major problem for transaction processing
 Sites may not be aware of each other and may provide only
limited facilities for cooperation in transaction processing
134

II. DISTRIBUTED DATA STORAGE
 There are two approaches to store the relation in the
distributed database:
 Replication: The system maintains several identical replicas
(copies) of the relation, and stores each replica at a different
site. The alternative to replication is to store only one copy of
relation r.
 Fragmentation: The system partitions the relation into
several fragments, and stores each fragment at a different site.
135

1. DATA REPLICATION
 A relation or fragment of a relation is replicated if it is
stored redundantly in two or more sites.
 Full replication of a relation is the case where the relation
is stored at all sites.
 Fully redundant databases are those in which every site
contains a copy of the entire database.
136

[CONTD…]
 Advantages of Replication
 Availability: failure of site containing relation r does not result in
unavailability of r is replicas exist.
 Parallelism: queries on r may be processed by several nodes in parallel.
 Reduced data transfer: relation r is available locally at each site
containing a replica of r.
 Disadvantages of Replication
 Increased cost of updates: each replica of relation r must be updated.
 Increased complexity of concurrency control: concurrent updates to
distinct replicas may lead to inconsistent data unless special concurrency
control mechanisms are implemented.
 One solution: choose one copy as primary copy and apply concurrency control
operations on primary copy
137

2. DATA FRAGMENTATION
 Division of relation r into fragments r1, r2, …, rn which contain
sufficient information to reconstruct relation r.
 Horizontal fragmentation: each tuple of r is assigned to one or more
fragments
 Vertical fragmentation: the schema for relation r is split into several
smaller schemas
 All schemas must contain a common candidate key (or superkey) to
ensure lossless join property.
 A special attribute, the tuple-id attribute may be added to each
schema to serve as a candidate key.
 Example : relation account with following schema
 Account = (account_number, branch_name , balance )
138

HORIZONTAL FRAGMENTATION OF ACCOUNT RELATION
branch_nameaccount_number balance
Hillside
Hillside
Hillside
A-305
A-226
A-155
500
336
62
account1 = branch_name=“Hillside” (account )
branch_nameaccount_number balance
Valleyview
Valleyview
Valleyview
Valleyview
A-177
A-402
A-408
A-639
205
10000
1123
750
account2 = branch_name=“Valleyview” (account ) 139

VERTICAL FRAGMENTATION OF EMPLOYEE_INFO RELATION
branch_name customer_name tuple_id
Hillside
Hillside
Valleyview
Valleyview
Hillside
Valleyview
Valleyview
Lowman
Camp
Camp
Kahn
Kahn
Kahn
Green
deposit1 = branch_name, customer_name, tuple_id (employee_info )
1
2
3
4
5
6
7
account_number balance tuple_id
500
336
205
10000
62
1123
750
1
2
3
4
5
6
7
A-305
A-226
A-177
A-402
A-155
A-408
A-639
deposit2 = account_number, balance, tuple_id (employee_info )
140

ADVANTAGES OF FRAGMENTATION
 Horizontal:
 allows parallel processing on fragments of a relation
 allows a relation to be split so that tuples are located where they are most
frequently accessed
 Vertical:
 allows tuples to be split so that each part of the tuple is stored where it is most
frequently accessed
 tuple-id attribute allows efficient joining of vertical fragments
 Vertical and horizontal fragmentation can be mixed.
 Fragments may be successively fragmented to an arbitrary depth.
 Replication and fragmentation can be combined
 Relation is partitioned into several fragments: system maintains several
identical replicas of each such fragment.
141

3. DATA TRANSPARENCY
 Data transparency: Degree to which system user may
remain unaware of the details of how and where the data
items are stored in a distributed system
 Consider transparency issues in relation to:
 Fragmentation transparency
 Replication transparency
 Location transparency
 Naming of data items: criteria
1. Every data item must have a system-wide unique name.
2. It should be possible to find the location of data items efficiently.
3. It should be possible to change the location of data items
transparently.
4. Each site should be able to create new data items autonomously. 142

CENTRALIZED SCHEME - NAME SERVER
 Structure:
 name server assigns all names
 each site maintains a record of local data items
 sites ask name server to locate non-local data items
 Advantages:
 satisfies naming criteria 1-3
 Disadvantages:
 does not satisfy naming criterion 4
 name server is a potential performance bottleneck
 name server is a single point of failure
143

USE OF ALIASES
 Alternative to centralized scheme: each site prefixes its
own site identifier to any name that it generates i.e., site
17.account.
 Fulfills having a unique identifier, and avoids problems associated
with central control.
 However, fails to achieve network transparency.
 Solution: Create a set of aliases for data items; Store the
mapping of aliases to the real names at each site.
 The user can be unaware of the physical location of a data
item, and is unaffected if the data item is moved from one
site to another.
144

III. DISTRIBUTED TRANSACTIONS
SYSTEM ARCHITECTURE
 Transaction may access data at several sites.
 Each site has a local transaction manager responsible for:
 Maintaining a log for recovery purposes
 Participating in coordinating the concurrent execution of the
transactions executing at that site.
 Each site has a transaction coordinator, which is responsible for:
 Starting the execution of transactions that originate at the site.
 Distributing subtransactions at appropriate sites for execution.
 Coordinating the termination of each transaction that originates
at the site, which may result in the transaction being committed
at all sites or aborted at all sites.
145

[CONTD…]
146

SYSTEM FAILURE MODES
 Failures unique to distributed systems:
 Failure of a site.
 Loss of massages
 Handled by network transmission control protocols such as TCP-IP
 Failure of a communication link
 Handled by network protocols, by routing messages via alternative
links
 Network partition
 A network is said to be partitioned when it has been split into two or
more subsystems that lack any connection between them
 Note: a subsystem may consist of a single node
 Network partitioning and site failures are generally
indistinguishable. 147

COMMIT PROTOCOLS
 Commit protocols are used to ensure atomicity across
sites
 a transaction which executes at multiple sites must either be
committed at all the sites, or aborted at all the sites.
 not acceptable to have a transaction committed at one site and
aborted at another
 The two-phase commit (2PC) protocol is widely used
 The three-phase commit (3PC) protocol is more
complicated and more expensive, but avoids some
drawbacks of two-phase commit protocol. This protocol
is not used in practice.
148

TWO PHASE COMMIT PROTOCOL (2PC)
 Assumes fail-stop model – failed sites simply stop
working, and do not cause any other harm, such as
sending incorrect messages to other sites.
 Execution of the protocol is initiated by the coordinator
after the last step of the transaction has been reached.
 The protocol involves all the local sites at which the
transaction executed
 Let T be a transaction initiated at site Si, and let the
transaction coordinator at Si be Ci
149

PHASE 1: OBTAINING A DECISION
 Coordinator asks all participants to prepare to commit
transaction Ti.
 Ci adds the records <prepare T> to the log and forces log to
stable storage
 sends prepare T messages to all sites at which T executed
 Upon receiving message, transaction manager at site
determines if it can commit the transaction
 if not, add a record <no T> to the log and send abort T
message to Ci
 if the transaction can be committed, then:
 add the record <ready T> to the log
 force all records for T to stable storage
 send ready T message to Ci 150

PHASE 2: RECORDING THE DECISION
 T can be committed of Ci received a ready T message
from all the participating sites: otherwise T must be
aborted.
 Coordinator adds a decision record, <commit T> or
<abort T>, to the log and forces record onto stable
storage. Once the record stable storage it is irrevocable
(even if failures occur)
 Coordinator sends a message to each participant
informing it of the decision (commit or abort)
 Participants take appropriate action locally.
151

HANDLING OF FAILURES - SITE FAILURE
 When site Si recovers, it examines its log to determine the
fate of transactions active at the time of the failure.
 Log contain <commit T> record: site executes redo (T)
 Log contains <abort T> record: site executes undo (T)
 Log contains <ready T> record: site must consult Ci to
determine the fate of T.
 If T committed, redo (T)
 If T aborted, undo (T)
 The log contains no control records concerning T
 implies that Sk failed before responding to the prepare T
message from Ci
 Sk must execute undo (T) 152

HANDLING OF FAILURES- COORDINATOR FAILURE
 If coordinator fails while the commit protocol for T is executing then
participating sites must decide on T’s fate:
1. If an active site contains a <commit T> record in its log, then T must
be committed.
2. If an active site contains an <abort T> record in its log, then T must
be aborted.
3. If some active participating site does not contain a <ready T> record
in its log, then the failed coordinator Ci cannot have decided to
commit T.
1. Can therefore abort T.
4. If none of the above cases holds, then all active sites must have a
<ready T> record in their logs, but no additional control records
(such as <abort T> of <commit T>).
 In this case active sites must wait for Ci to recover, to find decision.
 Blocking problem: active sites may have to wait for failed coordinator
to recover. 153

HANDLING OF FAILURES - NETWORK PARTITION
 If the coordinator and all its participants remain in one partition, the
failure has no effect on the commit protocol.
 If the coordinator and its participants belong to several partitions:
 Sites that are not in the partition containing the coordinator think
the coordinator has failed, and execute the protocol to deal with
failure of the coordinator.
 No harm results, but sites may still have to wait for decision from
coordinator.
 The coordinator and the sites are in the same partition as the
coordinator think that the sites in the other partition have failed, and
follow the usual commit protocol.
 Again, no harm results
154

OUTLINE
 Object-based Databases: Object Database Concepts
 Object-Relational features
 Object Data Management Group (ODMG) Object Model
 Object Definition Language (ODL)
 Object Query Language (OQL)
156

I. OBJECT ORIENTED CONCEPTS
 Extend the relational data model by including object
orientation and constructs to deal with added data types.
 Allow attributes of tuples to have complex types,
including non-atomic values such as nested relations.
 Preserve relational foundations, in particular the
declarative access to data, while extending modeling
power.
 Upward compatibility with existing relational languages.
157

COMPLEX DATA TYPES
 Motivation:
 Permit non-atomic domains (atomic  indivisible)
 Example of non-atomic domain: set of integers,or set of
tuples
 Allows more intuitive modeling for applications with
complex data
 Intuitive definition:
 allow relations whenever we allow atomic (scalar) values —
relations within relations
 Retains mathematical foundation of relational model
 Violates first normal form.
158

EXAMPLE OF A NESTED RELATION
 Example: library information system
 Each book has
 title,
 a set of authors,
 Publisher, and
 a set of keywords
 Non-1NF relation books
159

[CONTD…]
4NF DECOMPOSITION OF NESTED RELATION
 Remove awkwardness of flat-books by assuming that the
following multivalued dependencies hold:
 title author
 title keyword
 title pub-name, pub-branch
 Decompose flat-doc into 4NF using the schemas:
 (title, author )
 (title, keyword )
 (title, pub-name, pub-branch )
160

[CONTD…]
161

PROBLEM WITH 4NF SCHEME
 4NF design requires users to include joins in their
queries.
 1NF relational view flat-books defined by join of 4NF
relations:
 eliminates the need for users to perform joins,
 but loses the one-to-one correspondence between tuples and
documents.
 And has a large amount of redundancy
 Nested relations representation is much more natural
here.
162

II. OBJECT-RELATIONAL FEATURES
 Structured types can be declared and used in SQL
create type Name as
(firstname varchar(20),
lastname varchar(20))
final
create type Address as
(street varchar(20),
city varchar(20),
zipcode varchar(20))
not final
 Note: final and not final indicate whether subtypes can be created
 Structured types can be used to create tables with composite attributes
create table customer (
name Name,
address Address,
dateOfBirth date)
 Dot notation used to reference components: name.firstname
163

[CONTD…]
 User-defined row types
create type CustomerType as (
name Name,
address Address,
dateOfBirth date)
not final
 Can then create a table whose rows are a user-defined
type
create table customer of CustomerType
164

[CONTD…]
 Alternative way of defining composite attributes in SQL
is to use unnamed row types.
create table person_r (
name row (firstname varchar(20),
lastname varchar(20)),
address row (street varchar(20),
city varchar(20),
zipcode varchar(9)),
dateOfBirth date);
 The query finds the last name and city of each person.
select name.lastname, address.city from person;
165

[CONTD…]
 Methods
 Can add a method declaration with a structured type.
method ageOnDate (onDate date)
returns interval year
 Method body is given separately.
create instance method ageOnDate (onDate date)
returns interval year
for CustomerType
begin
return onDate - self.dateOfBirth;
end
 We can now find the age of each customer:
select name.lastname, ageOnDate (current_date) from customer
166

[CONTD…]
 Constructor
create function Name (firstname varchar(20), lastname
varchar(20))
returns Name
begin
set self.firstname = firstname;
set self.lastname = lastname;
end
 Inserting
insert into Person values (new Name(’John’, ’Smith’), new
Address(’20 Main St’, ’New York’, ’11001’), date ’1960-8-
22’);
167

[CONTD…]
 Inheritance
 Suppose that we have the following type definition for
people:
create type Person
(name varchar(20),
address varchar(20))
 Using inheritance to define the student and teacher types
create type Student under Person
(degree varchar(20),
department varchar(20))
create type Teacher under Person
(salary integer,
department varchar(20))
 Subtypes can redefine methods by using overriding method
in place of method in the method declaration
168

[CONTD…]
 Multiple Inheritance
 SQL:1999 and SQL:2003 do not support multiple inheritance
 If our type system supports multiple inheritance, we can define a type
for teaching assistant as follows:
create type Teaching Assistant
under Student, Teacher
 To avoid a conflict between the two occurrences of department we can
rename them
create type Teaching Assistant under
Student with (department as student_dept ),
Teacher with (department as teacher_dept )
169

[CONTD…]
 Array and Multiset Types in SQL
 Example of array and multiset declaration:
create type Publisher as
(name varchar(20),
branch varchar(20))
create type Book as
(title varchar(20),
author-array varchar(20) array [10],
pub-date date,
publisher Publisher,
keyword-set varchar(20) multiset )
create table books of Book
 Similar to the nested relation books, but with array of authors
instead of set 170

[CONTD…]
 Array construction
array [‘Silberschatz’,`Korth’,`Sudarshan’]
 Multisets
 multisetset [‘computer’, ‘database’, ‘SQL’]
 To create a tuple of the type defined by the books relation:
(‘Compilers’, array[`Smith’,`Jones’],
Publisher (`McGraw-Hill’,`New York’),
multiset [`parsing’,`analysis’ ])
 To insert the preceding tuple into the relation books
insert into books
values(‘Compilers’, array[`Smith’,`Jones’],
Publisher (`McGraw-Hill’,`New York’),
multiset [`parsing’,`analysis’ ])
171

UNNESTING
 The transformation of a nested relation into a form with
fewer (or no) relation-valued attributes us called
unnesting.
 E.g.
select title, A as author, publisher.name as pub_name,
publisher.branch as pub_branch, K.keyword
from books as B, unnest(B.author_array ) as A
(author ),
unnest (B.keyword_set ) as K (keyword )
172

NESTING
 Nesting is the opposite of unnesting, creating a collection-valued
attribute
 NOTE: SQL:1999 does not support nesting
 Nesting can be done in a manner similar to aggregation, but using
the function colect() in place of an aggregation operation, to create a
multiset
 To nest the flat-books relation on the attribute keyword:
select title, author, Publisher (pub_name, pub_branch ) as
publisher, collect (keyword) as keyword_set from flat-books
groupby title, author, publisher
 To nest on both authors and keywords:
select title, collect (author ) as author_set, Publisher (pub_name,
pub_branch) as publisher, collect (keyword ) as keyword_set
from flat-books group by title, publisher 173

III. OBJECT DATA MANAGEMENT GROUP
(ODMG) OBJECT MODEL
 Provides a standard model for object databases
 Supports object definition via ODL
 Supports object querying via OQL
 Supports a variety of data types and type construtors
174

ODMG OBJECTS AND LITERALS
 The basic building blocks of the object model are
 Objects
 Literals
 An object has four characteristics
 Identifier: Unique system-wide identifier
 Name: Unique within a particular database and/or program; it
is optional
 Lifetime: persistent vs transient
 Structure: specifies how object is constructed by the type
constructor and whether it is an atomic object
175

[CONTD…]
 A literal has a current value but not an identifier
 Three types of literals
 Atomic: predefined; basic data type values (e.g. short, float,
boolean, char)
 Structured: values that are constructed by type constructors
(e.g. date, struct variables)
 Collection: a collection (e.g. array) of values or objects
176

[CONTD…]
 ODMG supports two concepts for specifying object
types:
 Interface
 Class
 There are similarities and differences between interfaces
and classes
 Both have behaviors (operations) and state (attributes
and relationships)
177

ODMG INTERFACE
 An interface is a specification of the abstract behavior of
an object type
 State properties of an interface (i.e. its attributes and
relationships) cannot be inherited from
 Objects cannot be instantiated from an interface
178

ODMG INTERFACE DEFINITION
interface Date:Object {
enum weekday{sun, mon, tue, wed, thu, fri, sat};
enum month{jan, feb, mar, …, dec};
unsigned short year();
unsigned short month();
unsigned short day();
boolean is_equal(in Date other_date);
};
179

BUILD-IN INTERFACES FOR COLLECTION
OBJECTS
 A collection object inherits the basic collection
interface, for example:
 cardinality()
 is_empty()
 insert_element()
 remove_element()
 contains_element()
 create_iterator()
180

COLLECTION TYPES
 Collection objects are further specialized into types like a
set, list, bag, array, and dictionary
 Each collection type may provide additional interfaces,
for example, a set provides:
 create_union()
 create_difference()
 is_subset_of()
 is_superset_of()
 is_proper_subset_of()
181

OBJECT INHERITANCE HIERARCHY
182

ODMG CLASS
 A class is a specification of abstract behavior and state of
an object type
 A class is Instantiable
 Supports “extends” inheritance to allow both state and
behavior inheritance among classes
 Multiple inheritance via “extends” is not allowed
183

[CONTD…]
 Atomic objects are user defined objects and are defined
via keyword class
 An example:
class Employee(extend all_employees key ssn) {
attribute string name;
attribute string ssn;
attribute short age;
relationship dept works_for;
void reassign(in string new_name);
}
184

IV. OBJECT DEFINITION LANGUAGE (ODL)
 ODL supports semantics constructs of ODMG
 ODL is independent of any programming language
 ODL is used to create object specification (classes and
interfaces)
 ODL is not used for database manipulation
185

EXAMPLE 1: A VERY SIMPLE CLASS
 A very simple, straightforward class definition
class Degree {
attribute string college;
attribute string degree;
attribute string year;
};
186

EXAMPLE 2: A CLASS WITH KEY AND EXTENT
class Person (extent persons key ssn) {
attribute struct Pname {string fname …} name;
attribute string ssn;
attribute date birthdate;
short age();
};
187

EXAMPLE 3: A CLASS WITH RELATIONSHIPS
class Faculty extends Person (extent faculty) {
attribute string rank;
attribute float salary;
attribute string phone;
relationship dept works_in inverse
dept :: has_faculty;
relationship set<GradStu> advises inverse
GradStu :: advisor;
void give_raise (in float raise);
void promise (in string new_rank);
}; 188

EXAMPLE 4: INHERITANCE
interface Shape {
attribute struct point {…}
reference_point;
float perimeter();
};
class Triangle : Shape (extent triangles) {
attribute short side_1;
attribute short side_2;
}; 189

V. OBJECT QUERY LANGUAGE (OQL)
 OQL is DMG’s query language
 OQL works closely with programming languages such as
C++
 Embedded OQL statements return objects that are
compatible with the type system of the host language
 OQL’s syntax is similar to SQL with additional deatures
for objects
190

SIMPLE OQL QUERIES
 Basic syntax: select … from … where …
select d.name from d in departments where d.college =
‘engineering’;
 An entry point to the database is needed for each query
 An extent name may serve as an entry point
191

ITERATOR VARIABLES
 Iterator variables are defined whenever a collection is
referenced in an OQL query
 Iterator d in the previous example serves as an iterator
and ranges over each object in the collection
 Syntactical options for specifying an iterator:
 d in departments
 departments d
 departments as d
192

DATA TYPE OF QUERY RESULTS
 The data type of a query result can be any type defined in
the ODMG model
 A query does not have to follow the select … from …
where … format
 A persistent name on its own can serve as a query whose
result is a reference to the persistent object.
 For example,
departments: whose type is set<Departments>
193

PATH EXPRESSIONS
 A path expression is used to specify a path to attributes
and objects in an entry point
 A path expression starts at a persistent object name
 The name will be followed by zero or more dot
connected relationship or attribute names
 For example: departments.chair;
194

VIEWS AS NAMED OBJECTS
 The define keyword in OQL is used to specify an
identifier for a named query
 The name should be unique; if not, the results will
replace an existing named query
 Once a query definition is created, it will persist until
deleted or redefined
 A view definition can include parameters
195

EXAMPLE
 A view to include students in a department who have a
minor
define has_minor(dept_name) as select s from s in
students where s.minor_in.dname = dept_name
196

SINGLE ELEMENTS FROM COLLECTIONS
 An OQL query returns a collection
 OQL’s element operator can be used to return a single
element from a singleton collection that contains one
element:
element (select d from d in departments where
d.name = ‘Web Programming’);
 If d is empty or has more that one elements, an exception
is raised
197

COLLECTION OPERATORS
 OQL supports a number of aggregate operators that can
be applied to query results
 The aggregate operators and operate over a collection
and include
 Min
 Max
 Count
 Sum
 Avg
 For example:
avg (select s.gpa from s in students where s.class =
‘senior’ and s.majors_in.dname = ‘business’);
198

DBMS Unit IV and V Material

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to DBMS Unit IV and V Material

Similar to DBMS Unit IV and V Material (20)

More from ArthyR3

More from ArthyR3 (20)

Recently uploaded

Recently uploaded (20)

DBMS Unit IV and V Material