Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Dbms book-special-notes

2,791 views

Published on

DBMS special notes

Published in: Technology
  • Be the first to comment

Dbms book-special-notes

  1. 1. Chapter FILE ORGANIZATION 3.1 INTRODUCTION A file or anization is a way of arranging the records in a file when the file is stored on secondary storage (disk, tape etc. ). The different ways of‘ arranging the records enable different operations to be carried out efficiently over the file. A database management system supports several file organization techniques. The most important task _()_f_Wl2§. §_ is J, O_§l_1_C;0§€__«‘*. hest organization for each file, based on its use. The_c_)rga_niz_at_ion__Qf, records in a fi_le_i_s _in_fluenc_egl b} _rr§_sLbe taken into consideration while choosing a particular technique. These factors are (a) fast retrival, updation and transfer of records, (b) efficient use of disk space, (c) l'-iEFthrougl1put, "(f5_type of use, (c) efficient manipulation, (f) security from unauthorized access, (g) scalability, (11) reduction in cost, (i) protection from failure. This chapter discusses various file organizations in detail. 3.2 BASIC CONCEPTS OF FILES Q File is a ggflgction of [elated seqger_1ce%<3fr'ecords. A collection of field names and their corresponding data types constitutes a record. A data type, associated with each field, specifies the types of values a field can take. All records in a file are of the same record type. 3.2.1 Records and Record Types Data is generally stored in the form of records. A record is a collection of fields or data items and data items is formed of one or more bytes. Each record has .1 unique identifier called 1fl5_°. T_d£ The records in a file are one of the following two types : (il Fixed Length records. (50 Variable Length records. 47
  2. 2. 3.2.1.1 Fixed Length] R; ::r: :,¢dy the same size (in 531""-'5)‘ The ’°°°‘d 5‘°IS are . fi e Everx. L£‘C9'd '! !’. *l‘9«-—-- ~ I“ and ___é_I_! ‘_€__€in'-‘mg! mb “ id-"d nd_s, ¢1L". ‘.'. Slilhmgv‘ l)IE‘N: ILmcord and the Figure 3 We srL'oi: .’l' = NA[E = Roll N0 = Ch3'(5)3 end a rd The Figure 3.1 shows a smm [en 1 . ' ords 28-02-75 20-05-80 24-10-77 15-09-72 18-1 1-70 09-06-73 j 05-01-81 E FIGURE 3.2. Portion of a file of fired length records. ‘ Advantage of Fixed Length Records 1. Insertion and deletion of records in the file are simple to implement since this? !‘ made available by a deleted record is same as needed to insert a new record Disadvantage of Fixed Length Records uses wastage of precious memory space d . , _ _ _ 1 record, then major changes in P Ple. if it 15 required to increase the leflgdl smaller, it cannot be used and if it is too Sim afethe exact I5 If tfiiylll‘ variable length records due to follow’ 8’ tra space 15 l“5t Wasted’ A file I inthe me. &E°_"££L‘5 idem usi: ::‘N
  3. 3. Fur ORC. /NiIAll()'4 49 3_ One or more than one fields a re optional i. e., tl - . for all records. The file records in this case ‘rt? m1)’ h 4. The file contains records of different record typ Advantage of Variable Length Records 1. It n3dUC8S manual mistakes as database automatically 2. lt saves lot of memory space in case of records of V 3. It is a flexible approach since future enhancements a Disadvantage of Variable Length Records 1. It increases the overhead of DBMS because of all records. ave V-llllI"i for mniio but Itnl also of the same record type, es and different . 'tl7.l"t. adjust the size of record, ariable lengths. re very easy to implement. database have to keep record of the . 'l7.("t 3.2.2 Types of Files The following three types of files are used in database systems : 1. Master file 2. Transaction file 3. Report file. 1. Master file : This file contains information of permanent nature about the entities. The master file act as a source of reference data for processing transactions. They accumulate the information based on the transaction data. 2. Transaction file : This file contains records that describe the activities carried out by the organization. This file is created as a result of processing transactions and preparing transaction documents. These are also used to update the master file permanently. 3. Report file : This file is created by extracting data from the different records to prepare a report e. g. A report file about the weekly sales of a particular item. 3.3 FILE ORGANIZATION TECHNIQUES A file organization is a way of arranging the records in a file when ‘the. file is stored ‘on secondary storage (disk, tape etc). There are different types of file organizations that are usfid by applications. The operations to be performed and the selection of storage device are I 9 maior factors that influence the choice of a particular file organization. The different types Of file organizations are as follows : -' 1. Heap file organization . . 2. Sequential file organization }. Indexed——Sequential_ file organization ‘/4. Hashing or file organization.
  4. 4. 50 nization _ 3'“ Heap File Orga n the filex _in NW’, nifler In w”li/ ,1 they are . - curds are stored at the end of the tile. Tl: il. S.f'l9 | °r8am7-“H0” is also i"5erted' A” {he m“ T0 L mintion l5 3°"e“‘". V used “nth . a(. j mona access, paths’ like called PH‘; HLE: llrh-X 1 new record is very fast and efficient. But S€i_31'Cl"”“8 5' record Sec-ondary lndefefil nilhtliwlriélinx-ol'cs a linear search through the f_'l‘_3' Winch ‘S ? °mP3fative1y using, any search cont 1D «,1 ‘tin 1 J record from a heap file is not efficient as deletion of records more time consunnnb: L c 3, “C. It is gemrally used to store small files or in cases where "e5“"'3d ‘" “"‘m3° of *“°'. “*"" SP‘. . ~ ' l data is collected at one Place Pri data is difficult to organize. It IS also used W W? “ °’ to processing. . . , - - re stored i In ghis file organization, thu. records .1 Advantages of Heap Fih: Organization 1. Insertion of new record is fast and efficient. 2. The filling factor of this file organization is 100%. 3. Space is fully utilized and conserved. Disadmrzragc of Heap File Orgmrimtion . Searching and accessing of records is very slow. Ix)" Deletion of many records result in wastage of space. Updation cost of data is comparatively high. It has limited applications. : ‘*'. ~*’ 3.3.2 Sequential File Organization k9.'"- A mg is an attribute or a set of attrib records. It 15 not necessary file is shown in Figure 3.3 The rec order one after another To reach at t nse ‘ ' ' —. #.l’1“-¢£L_<iu1nyere The Pointers are used for fast retriev . al of reco ds, _: ‘j§. Beginning of me ords are stored/ in sequential _Rgmters are used-
  5. 5. Fits Ow. /-. -.~yrsc, ~«. ' 51 To react any record from the file start searching from the very first record with the help . _ ‘ - -_ Sec uenthl ll e or ' - >————a——-. _—. __~. ._. _._m_§ or scaml 9.‘ l ‘ gamzation giycs rccor 9 in sorted form. This organization is v""""" . ' ‘ , ‘ use-d ix ifIlcs. 3.3.2.1 File OP°‘'3ti°“5 011 Sequentially Organized Files "ari0U5 °P9mtl°“5 that Ca“ be P9Fl0TmL'd on sequential files are as follows‘ ii) Crmling H 5<’1l"¢”1llfllfilo. ’ : To create a new file, first free space is searched in memory After Searchmgr e"°'~‘8h SP3“? l0F mt‘. allocate that space to the new file ’ame ancl physical location of new file is entered in file system directory. Search free space If found l Allocate s ace for new nle ' Enter name and physical location of new file In file system directory FIGURE 3.4. Steps of creating a sequential file. enter the name of file. This name is searched records in the file are transferred the file is ready to read / write. (b) Open um existing file : To open any file, in file system directory. After matching the name, from secondary memory to primary memory. Now, (C) Closing a file : When task over file is completed then close that file. The memory space allocated for that file in primary memory is deallocated. lf file was opened automatically with any programme then it will be closed automatically after ending that programme. (:1) Reading a sequential file : To rea start from beginning of file until recor directly to any particular record. Only linear search can d any record from a sequential file it is necessary to d is find or EOF is occured. You cannot go be used to search any record. Beginning Current E“dl"‘9 ol file position of file (EOF) . ___————-? ————: —————> Reading at file FIGURE 3.5. Reading of sequential file. , ‘ "I . Em sequential file system. update, delete, insert and append cannot be performed on any r= ‘€0’“l 1""! 5' There 5 a procedure to perform ‘all ‘these operations. ""“‘ ”‘S“rx‘1~n~——-§__. -_, _, _ _
  6. 6. New master file FIGURE 3.6. Updation of sequential file. Procedure to Update File : Step 1. Original file is known as Old Master file. Make a new file which is hm as New Master file as shown in Figure 3.6. Step 2. To update 12"‘ record, First copy all the previous n—1 records to new master 53¢ Step 3. Make necessary updations to 11"‘ record. Step 4. Now copy all remaining records including updated nth record to New mast? ‘ file and old master file is deleted.
  7. 7. —m The mvfib in r 0'vr; mr/ prnu 53 that “Nd In be ndcled nrv mllm-tml ln 'l‘mmwntm Me Me zlvr/ In in Figuar '53 Transaction I'll. Mm - ii] :1 '*-"nay ‘/ mm l Ami: 9,01)! ’ FIGURE 3.8. Tnlmurfinn file in mid nwmjg The EM1’-ID is primary key. Now, add records; in New Mimtt. -r fill: as shrrwn in Figure 3.9. If riniar ' ke of record in Old Mamler file in lmr-. than nvrmd in T{aru, a(1j(; n me _u1en first copy record of Old Master File into New Mimltrr file and -. rir_;3 -/1553,’ This process is repeated until EOF is rmched in ‘both files. . -_'. —-—--"""—'_"‘——*'- Emp-records (New Master File) 15,000 7,000 0 3,000 . 10,000 9,000 FIGURE 3.9. New master file fbr employee records after addition. Lm¢umuacfion file are deleted after adding the records. u want to delete some records from Emprecords (as (f) Deletion of records : Suppose yo shown in Figure 3.9) which are no more required. Then L-mp-records shown in Figure 3.9 is taken as old master file. Records that need to be deleted are stored in Transaction file. FIGURE. 3.10. Transaction file to delete records. Prima Ke teLfil§_. is. _matched _v1i_tl1_primary key of each record in key‘ is not matched then record is copied from Old Master fansaction fi1e. d that record. The process is repeated until 51? tQ_New ‘Master file otherwise discar d. After this process. Old Master file and transaction file are deleted. Emp-records (New Master File) HGURE 3.11. New master file for employee records after deletion.
  8. 8. mam Svsrw 54 INTRODUCTION TO DATABASE MANAG . . . . t to modif)’ any "-’°°'d °f empbyee‘ Ciluu (S) Ii: /!n‘: df"{; P(: e‘: ‘!<’)‘: ',clso"ir: e‘f: ’i’; ilfre SI5‘.11,lP(t: eOyl(: iu itivztler file. Records that need to be m0<‘-lified stored in Transaction file with changed V“l"°5' Transaction File finzm 5"“ Sun" "'°°° FIGURE 3 12. Tmnsactionlflle to modifi Itcords. Prima ke of Old MasteLfi.1e, is_matche_d_wit1Lprimary. ,key.0§ea. c1; . r_e__c9rg. i__r3 fileg Erimag key is matched then modified record from Tl'«'mS3CU0n f1li¢, l b New Master file otherwise record fr0m, Q1d-Ma5.tE_3! file is C°Pl9d mm Ne‘V_M“$t£’ 51$-_Tl‘| € proch is repeated until EOF in—CEiTlviaster file is reached. After this process, Old Master me and transaction file are deleted. Advantages It is easy to__understand. Efficient file system for small size files. .1. ,2. V3. Construction and reconstruction of files are much ea /4. . /5. sier in comparison to other Eh systems. Supports tape media, editors and compilers. It contains sorted records. Disadvantages ’ ’ .1. Inefficient file system for medium and large size files. ,2. Updations and maintenance are not easy. Inefficient use of storage space because of fixed size blocks. / 4. Linear search takes more time.
  9. 9. Fur ORGANIZATION 55 FIGURE 3.13. Index sequential file uy3,, ,,, ',, ,,, ',, ,,_ 35.3.1 Componenb of an Indexed Sequential File ‘-——.2 "9' ‘ V ' ‘ . ~'-_--. _.. .Durin~th - . . ~ _ 2 _e tC’_‘3“£: l°“ Of mdQ<_S. §qUCntla_l_fflc, records are wntten in prime . ev: ~:: _< ve main ‘ - s u am according to any key. Hence, prime area must be a 3 3-3-_ _ : : rs: : Overflow area is used during addition of new records There are two -‘es of overriow area : _: ‘t ‘.3Z. '-'_: :'r scarf 0:1: area : This area consists of free area on each cylinder which is . -saved tor overflow records for that particular cylinder. 5* '- "~""—"'. .f; ‘r: f overflow area : This area is used to store overflow records from - . .--__. -' : .'_r. ::fer; =tics : Usually fixed length records are used. _' -_-. . ‘ . "°_: ':. :; : Index is a collection of entries and each entry corresponds to block of storage. -:7 I-""‘ is: -3! index : The lowest level index is known as first level_ index. -- -. .‘; .. ---. . "-7 ‘-'1‘; --' ~'-rv’ inriarc : Higher level indexing is used when first level index becomes :5 large. ' " "—*- : ‘r: .i'ex : The indexes can be made according to the hardware boundaries. . .:: . I. ., Gd-A . .a-. Cjflirzcler index entries consists one entry for each cylinder. "'<'= ' index : The highest level of index is known as master index. 1" ‘:32 Operations on Index Sequential Files 9:‘ ‘: ":'2.. ’L’Z{ . z l. ".. .I': 'I seqiiential file : After allocating free space and make necessary entries 7"‘ ‘ls svstem directorv, all records are written into prime area in sequential manner » -0.. . s-. —~ 1.. .. accczding to key value. .5‘ and closing an existing file : It is same as sequential file operations. : To search any record 55} R—. t.. "‘= ":'. *:g: records from a index sequential file (or setircliing any record) mh enter the key value of record _then first search the block of record in index then sea 3?-‘Cord sequentially within the block. '5) . ‘-! »:d: '+Er; z:ion of records : To, modify a record, first search that re ‘-7- Updated record is stored at same position if it has same r C; 'e'r. i. cord and then m0dif. V key Vajue and size | _;, ll
  10. 10. ANAGEMENT SYSTEM 56 I~TRoouc‘nON T0 DATABASE M , - ' ' 1 record is deleted d aqua; to the original record. Othemise onglna an new racer}: inserted. (5) D‘-Igtimx ft’COl'tl$ . ' T0 delet i e a record, fi1'5t Seam: that£ec°_r: ' Sgzcific Codes are i. to indicate deleted records. Records consists of ags P80 C C 9 15 Inserted toe. ‘ d l t d. flag of that record that needs to be 8 3 9 (f) ; ,,_s-mun of records : To‘ insert a record, first find the desired block according to by value. Then confirm that record does not . eX: tb: l)Tc9:dY- The TeC0fd is overflow area and then copied to the desire . . -iiizvantages (1) Efficient file system for medium and large size files. .(. -3) Easy to update. (fir) Easy to maintain than direct files. . (iv) Efficient use of storage space. (3) Searching of records are fast. ' (:9 . laintain advantages of sequential file system. Disadvantages , (0 Inefficient file system for small size files. ’ (517 It is expensive method. -(ii! ) Typical structure than sequential files. ' ‘ * Ji? ) Indexes need additional storage space, (3') Performance degradation w. r.t. growth of files. (it) Folding method : In th' then told them.
  11. 11. 32 01. ‘(W9 L->| _A: c.‘ess ' noun: 3.151 sew, Ir (b) Shift folding : In this technique various parts of key ~~ . . . , ' you can say that they are sunply added as shown in . :--. ::e 3 ‘-5 m Address FIGURE 3.16. 95-’: 5.3»; (iii) Division method : In division method, divide the -. -alt: -3 '. -v'i‘_h . =_; ~._~. ~ and take quotient as address of that record. Mostly prim. :1u: ;:': e_r is as 55.-; _=, c(-_ Consider a record having key value 550 which Ls divided by 5 gves 110 as quct. ‘e: .t which is taken as address of that record. (iv) Division-remainder method : In division-remainder method, divide key value with any number (Prime number) and take remainder as address of ‘that The divisor must be greater than total number of records and st: -.311 7:: -".911 key values. Suppose there are 100 records in any file. A record has key value 56-‘) -. -.": ‘.iC. "t is divided by 109 gives 6 as remainder which is taken as address of that record. (v) Radix transformation method (Radix conversion method) : In radix conversion method, first key is converted into binary string. Then this string is partitioned into small strings. Then these strings are converted into dedma format. Suppose. a record has key value 269 whose binary equivalent is 100001101. Take small strings of 3 bits each. These are 100, 001, 101. Now convert them into decimal format which givs 4, 1, 5. These digits are treated as numbers with radix r. Suppose r is S- Then addressis 4x52+1X51-'2-5X5C=110 (00 Polynomial conversion method : In polynomial conversion method every digt of key value is taken as coefficient of polynomial. After obtaining that polynomial. difide It by another polynorpial. The remainder polynomial gives the basis of address of that record. L (vii) Truncation method : In truncation method, some digits of key value are They may be left most digits, right most digits or central digits. The remaining d‘S“5 are taken as address of record. Suppose a record is having key value 5-131S9- The“ truncate first two left most digits, which gives 3189 as address for that record- lviill Conversion using digital gates : In this method, convert key value into bi! ‘-17?’ f°"_“: : then apply different operations on these digits and at last convert the 195"" "‘ decimal format as shown in ‘Figure 3.17.
  12. 12. l AGEMENT SYSTEM 58 lnmooucnon To D"7"‘“5‘ Mm FIGURE 3.17. Conversion using digital gates. 33.-Ll Collision and 5)’“°"Y"‘5 111:: main dis1d'.1nt. n;e of hashing is collision. Co. ’Ir'. <:'mr . - A collision occurs when a hash function f maPP9d more than one key value_ same ph_'sic.1l addmga. rt There are two main collision resolution (1') Open Addressing (ix) Overflow Chaining. technique : Employee _
  13. 13. lb) 59 FIGURE 3.18. Linear Pfflblflg, For eniployec . -Iand. Dept-lD is mapped into 1. O o For en1ploya. ~ R. l'I. ‘$l‘I, Dept-ll) is mapped into 2_ o For cxnplovev. ‘ l‘i'u. <h. Dept-ID is mapped into 3. o For crnployee Deepak, Dept-ID is mapped into 2. A collision occurs because location 2 is not empty. So, searching is started from location 3. Location 4 IS empty which is given to Deepak. o For eniployt-e Manoj, again Dept-ID is mapped into 2. A collision occurs because location 2 is not empty. So, searching is started for an empty location. Here Block- I is full so searching is started in consecutive block, Block-ll and location 5 is given to . l; moj. Disadvantages (:1 Creation of cluster of records (clustering effect). (ii) Searching time for free location is more. Rc‘: .z<}: ing : To avoid clustering effect in linear probing more than one hashing functions can be‘-used. If first hash function results collision then another hash function is used to resolve collision. But this method results extra overhead. (ii) _Ot~crflow Chaining : In this method, there are two ‘areas for storing. records, primary area in which record is stored in normal conditions, if any collision occurs then synonym key record is stored in overflow area. Consider the example of relation employee shown in Figure 3.18. Now the employee Deepak and Manoj are stored as shown in Figure 3.19. Ea] mm Primary area Overflow area FIGURE 3.19. Overflow chaining, Advantages 1‘ Less Searching time required. : N0 clustering effect. ‘ Mme efficient than open addressing method. ——_r r-~1 r "" ‘- -- -. ... .:__. ... ____. ._-_. ... ... ,., ____', .,. .
  14. 14. l‘JTRODUCTlO’~1 ro Dime; -.c f-/1,u. ;r. iv/ W: ‘H‘«f*'/ 60 3,3,5 Direct I-‘ile Organization To meet the requirement to access records randomly din-u file 4.ry_. irii'/ ..itiori is used. In direq ds can be stored Miywlusrie in ti/ I711’/ '1' arm but ram lit‘ «ICCCS-‘€(l dlfcctly, file organization recor _ ‘ l_ _ _ ’ | Searching, It ()'. 'l. '!(lIHiI". the Ilr. :'/ /lmrlrr. til ~*.4'qI| f'f| lMl, "W U‘ -9‘-qllcntlal "without any sequentia and B-trees file organization. For an efficient organimtion and d; ,._. ,J , ,,“1.; ,_' U, Wh, mi, ,,. ,] y(_-(()l'(l, some mapPl“8 01’ transfonnafion Procedure is needed thin C0”. /,‘. "_. , ha’: [inch] 1,! ,3 H‘! l; f(l lfllf) ll. “ pl'IySlCfll SlOl'age _ . _ ' , ~ location. Actually, direct file organizititfl depends upon haaiiirig that provides the ofgmaiiping ggcgdurg. To overcome the drawbacks of ll£i: l_l‘l_n; V;V algurithrn, crilliniriri rt. 'F't()lll_l_i(if1l(! Cl’1nlqUe is needed. Devices that support direct access are Life», floppy i, -tc. Direct file organization is ai_ls_o knovm as l’-: 'ifl(lI"l'ff}_‘_l_'l_l‘_7__(_)vfZi)nl7.f'fll_)_f_1'. Choice of hashing algorithm and collision resolution technique is crucial point in direct file organization. 3.3.5.1 File Operations on Direct Files ((1) Creating a direct file : After searching and allricating free space for file, necessary entries in system directory are made. in direct file or; ;:ini7,ation, a hashing algorithm for mapping procedure and any collision rrzsolutirin tr. -rlrinique to avoid collisions during mapping are specified. The key field is also ’)p‘fCifl('d (It may not be primary key), (b) Open an existing file and closing a file are same as in other file organizations. (c) Searching (Reading or retrieving) from direct file : To read any record from direct file justl enter the key field of that record. With the help of hashing algorithm that key field: is mapped intophysical location of that record. in case of any collision, collisionl resolution technique is used. I (d) Updation of records in direct file : (i) Adding a new record : To add a new record in direct file specify its key fem Wm‘. I i . the hell’ 0‘ mapping procedure a d ll‘ ‘ . - - address location for that record. n C0 lslon rcsolutlon tcchmque’ get the fmei (ii) Deleting record from direct file : To delete a Searching. change its status code to delc (171) Modify any record : To modify any necessary modifications. Then re- Advantages l record, first search that record and after§' ted or vacant. i l ’§C°| 'd, first search that record, then make the; wnte the modified record to the same location-‘ll . 2. It gives fastest retrieval of records v3. It gives efficient use of memory 4. Operations on direct file are fast so in a file, as in sequential file system search keys as in B—trees. 5_ Supports fast storage devices. i-¢rirs—-—a; s.. . IL) (W<l—. ‘
  15. 15. ->--'' . . 61 FILE ORGANIZATION Di5adz'a7ltflg€5 l. 1 4. 3.4 ; —{{' _. _n index is a collection of data entries which is used to locate a record in a file. Index table m-ords consist of two parts, the first part consists of value of prime or non-prime attributes of file record known 35 l"de"i“3 field. and: the Second part consists of a pointer to the n where the record is physically stored 'in memory. In general, index table is like the lttatio pa 1 W-l5t~‘8" °f storage Space (Clustering) if hashing algorithm is not chosen properly It does not support sequential storage devices_ Direct file system is complex and hence expen5iw_-_ ““'T‘ er“ Extra overhead due to collision resolution techniques. INDEXING --‘-7-I‘ rr r-pg-I m p——. _—-. .—- index of a book, that consists of the name of topic and the page number. During searching ‘ of .2 file record, index is searched to locate the record memory address instead of searching i a record in secondary memory. On the basis of properties that affect the efficiency of searching, the indexes can be classified into two categories. . _.-_. _ 1. Ordered indexing . 1 I-lashed indexing; 3.4.1 Ordered Indexing In ordered indexing, records of file are stored in some sorted orderjnphysical memogy. The values in the indK'are ordered (sorted) so that binary search ‘can be performed on the index. Ordered indexes can be divided into two categories. 1. Dense indexing 2. Sparse indexing. 3.4.1.1 Dense and Sparse Indexing 0 Dense index : l_n___d_e;1se indexing there is a record in index table for each unique value of the search-key attribute of file and a pointer to the first data record with that value. The other records with the same value of search-key attribute are stored sequentially after the first record. The order of data entries in the index differs from the order of V data records as shown in Figure 3.20. Advantages of Dense index (0 It is efficient technique for small and medium sized data files. (ll) Searching is comparatively fast and efficient. Disadvantages of Dense index (0 Index table is large and require more memory space. ('1) lnsertion and deletion is comparatively complex. (“'3 1n~efficient for large data files.
  16. 16. 62 M-Rooucnon T0 DATABASE MANAGEMENT SYSTEM pointer to physical memory address of 31 of all prime- e record of d 1(9)’ attributes of a table and ‘ ata file. To retrieve a record °“_ Inn-. .'. a.. ... ... V.. ._. ... ... ... ... ... ... ... . . ._ _
  17. 17. ' FILE ORGANIZATION 63 the basis of all primary'l<€y attributes. primary index is used for fast searching. A binary search is done on mdex table and then directly retrieve that record from physical memory. It may be sparse. Advantages of Primary index /0) Search operation is very fast. _ (fij Index table record is usually smaller. (, ',', ) A primary index is guaranteed not to duplicate. Disadvantages of Primary index (1) There is only one primary index of a table. To search a record on less than all prime-key attributes, linear search is performed on index table. (ii) To create a primary index of an existing table, records should be in some sequential order otherwise database is required to be adjusted. 0 Secondary index : A secondary index provides a secondary means of accessing a data file. A secondary index may be on a candidate key field or on non-prime key attributes of a table. To retrieve a record on the basis of non-prime key attributes, secondary index can be used for fast searching. Secondary index must be dense with a index entry for every search key value and a pointer to every record in a file. Advantages of Secondary index (1') Improve search time if search on non-prime key attributes. (ii) A data file can have more than one secondary index. Disadvantages of Secondary index (i) A secondary index usually needs more storage space. (it) Search time is more than primary index. (iii) They impose a significant overhead on the modification of database. Search-key attribute for Secondary index Search-key for secondary index FIGURE 3.21. Primary and secondary index.
  18. 18. 64 lrmzooucnon TO DATAme MANAGEMENT Svsrsm and Multilevel Indexes 0 Single level inrlrxes : A single stage index for a data file is known as single level ' I A single level index cannot be divided. It is useful in small and medium size d lnd. If the file size is bigger. then single level, indexing is not a efficient method Sfita fl if is faster than other indexes for small size data files. ' arc ' o Multilevel imIc. i-cs : A single index for a large size data file increases th . table and increases the search time that results in slower searche T: size of ind multilevel indexes is that, a single level_index is divided into In 8 idea behi . reduces search time. " ”““‘ ~ u hple levels, whic In multilevel indexes, the first level index consists of two fields, the first field Consjs of a value of search key attributes and a secon ' d f1 Id ‘ . (or second level index) which consists that value: an: l0ns1sts of a pointer to the b] . so on. ‘ 3.4.1.4 Single To search a record in multilevel ind ' I b I ' the small values or equal to the on: xtl‘i§: laf)’dscarch is used to find the largest of nee s to be searched Th . a . e pointer points i a block of the inner index After r ' . - hm to th d ' searched (in case of two-level inde ‘la; g e eslred block the d ' xmg) otherwise again the la . esired record . rgest of the small va] : ._» °1' equal to the one that nee ds to be searche
  19. 19. Fu. £ Oacmxzmou 65 3 42 Hashed Indexing T we 9 the disadvantages of ordered indexing, a hash index can be created for a data fife Hashing allow us to avoid accessing an index structure. A hashed index» consists of two 5 alas, the first field consists of $arc_l1_l_<ey. attribute values and second field consists ofiifinter fhghash file structure. Hashed Indexing IS based on values of records being uniformly ; °.mb, ,,ga using a hashed function. ' :5 35 sane: mosx FILES )0 ; #-j—— tree structure is used. A B-tree of order m is an m—wav search tree with 1,, 5.Tree index files, the following properties. 1. Each node of the tree, except the root and leaves, has at least subtrees and no more than n slubtrees. It ensures that each node of tree is at least‘ half full. 2. The root of the tree has at least two subtrees, unless it is itself a leaf. It forces the tree tobranch early. 3. All leaves of the tree are on the same level. It keeps the tree nearly balanced. To overcome the performance degradation w. r.t. growth of files in index sequential files B-Tree index files are used. It is a kind of multilevel ind_ex file organisation. In tree structure, search starts from the root node and stops at'leaf node. (i) Non-Leaf Node : ln non-leaf node, there are two pointers. Pointer P, points to any other node. Pointer R, points to the block of actual records or storage area of records. K, represents the key value. _ (in L_2a_f. Nqde { in leaf node, there is only one pointer P, which points to block of actual records or 5l°F3ge area of records. K, represents the key value. A B-tree for file employee is shown in Figure 3.23. , Amy cmrag Harish Lalit Praveen bucket bucket bucket bucket bucket FIGURE 3.23. B-tree fin file Employee.
  20. 20. oF DIFFERENT FILE ORGANIZATIONS ON 3_1 COMPARIS Random retrieval on i I pfima. -y key is impractica . 11-iere is no wasted space For data. Sequential retrieval on primary key is very fast. Multiple key retrieval in sequential file organization is possible. Updating of records Hashed/ Direct seq Random retrieval of Random retrieval of pfimary key is moderately Fast. No wasted space For data but there is extra space for index. Sequential retrieval on primary key is moderately Fast. Multiple key retrieval is very Fast with multiple indexes. Updating of records primary key is very fast- Extra space for addition and deletion of records, Sequential retrieval of primary key is impractical Multiple key retrieval is not possible. Updating of records is the Ememll)’ require‘ 'eW"itl"'8 easiest one. requires maintenance of" the file. Addition of new records _ _ _ Addition of new records requlrs rewriung the file. _ _ Addition of‘ new records is is easy and requires very easy. Deletion of records can Deletion of records is very
  21. 21. N P‘9': “E“! ~‘ 9° (W7 . / FILE Oacamzanon 69 complexity : If technique is fast then it is complex and expensive. Whether funds are available to adopt new techniques. (1,, -9 Availability of hardware I The hardware that supports file organization. Example tape reader supports only sequential file organization. EXERCISES . ;—*’ What is an index on a file of records ? What is a search key of an index ? Why do we need indexes? Explain index sequential file organization with example. What do you mean by file organization? Describe various ways to organize a file. List two advantages and two disadvantages of indexed-sequential file organization. List out advantages and disadvantages of variable-length records and fixed—length records. Discuss disadvantages of using sequential file organization. How do you overcome those problems with direct-indexed file organization. List advantages and disadvantages of direct file lccess. What do you mean by collision ? Discuss techniques to avoid collision. 9. Define hashing. Discuss various hashing techniques. 10. 11. 13. 14. 15. 16. 17. Define B-trees. How insertion and deletion is made in B-trees? Explain. What is hashing and index—sequential file organization ? Give an example of an application of each. Compare and contrast sequential and index sequential file organizations. Why is a B’-tree a better structure than a B—tree for implementation of an indexed sequential file? Discuss. Explain the structure of an index sequential file organization, with a suitable diagram. Write three differences between index sequential and B-tree file organizations. Why can we have a maximum of one primary or clustering index on a file but several secondary indices ? Your explanation should include an example. What is the difference between primary index and secondary index? What is non-dense indexing? Construct a B’-tree for the following set of key values. (2, 3, 5, 7, 11, 17, 19. 23, 29, 31) Assume that the tree is initially empty and the number of pointer that will fit in one node is four or six. l"‘*"--. -.. ..__
  22. 22. Fu~cno-AL Dswzuoencv AND N0RMMJ5*T‘°"‘ 111 (iii) Reducing the Null Values in tup1e5_ (W, Nut allowing tho possibility of genemh-n 0) Meaning of the relation attributes , ,_. lation sclia. -ma, it is assumed that real-wurd nit. -aning g Spurious tuples, 1 When the attributes are grouped to f0|'m 3 attributes belonging to one relation have certain and .1 . - . . . , . (9 nnnmw (‘P Cm h Fliropir interpretation associated with them. This meaning . .' 4 1.‘ us ow - _ , . tie Jllflbuit ialues in a tuplc relate to one another. The conceptual (it-sigli should jmvc J ck . , _ . . _ ‘or meaning, if it is done carefully followed b a ‘V""m‘"'C m“PP’”g ""0 l’t'lJli0ns and most of the semantics will hiv an act uniied ' ’ « . ' I . . c c’ 0 l . Thus th - -a_ ' - ' ' . - , . . . Ur L “Sr " '5’ m "‘Pl*““ th'. meaning of the relation, the better the relation schi, -ni. i design will be, CUIDELINE 1 : Dc’ ‘ j - ema that corresponds to one entity type or one relationship type has a straight forward meaning. (ii) Redundant information in tuples and update anomalies : One major goal of schema design is to minimize the storage space needed by the base relations. A significant effect on storage space occurred, when we group attributes into relation schemas. The second major problem, when we use relations as base relations is the problem of update anomalies. There are mainly three types of update anomalies in a relation i. e., insertion Anomalies, deletion anomalies and modification Anomalies. These anomalies are discussed in the section 6.3 in detail. CUIDELINE 2 : Design the base relation schema in such a way that no updation anomalies (insertion, deletion and modification) are present in the relations. If present, note them and make sure that the programs that update the database will operate correctly. (iii) Null values in tuples : When many attributes are grouped together into a "fat” relation and many of the attributes do not apply to all tuples in the relation, then there exists manv NULL'S in those tuples. This wastes a lot of space. it is also not possible to understand the meaning of the attributes having NULL Values. Another problem occur when specifying the join operation. One major and most important problem with Null’s is how to account for them when aggregate operation (i. e., COUNT or SUM) are applied. The Null’s can have multiple interpretations, likez" fa) The attribute does not apply to this F1119- fb) The attribute is unknown for this tuple. (5) The value is known but not present i. e., cannot be recorded. GU1DE]_[NE 3 ; Try to avoid, placing the attributes in a base relation whose value may usually be NULL. lf Null’s are unavoidable, make sure that apply in exceptional cases only and majority of the tuples must have some not NULL Value. (iv) Generation of spurious tuples : The Decomposition of a relation schema R into two relations R] and R2 is undesirable, because if we join them back using NATURAL . - 0 - . . ) ' t join, We do not get the correct originqldinfojrmation. The join OpLl'£'! t1.0ni'g1C: e€CI': liOE: spurious mpjes that represent the invai information. More about jolfl (6.4.2.6). ‘_. _ -
  23. 23. 12 lmowmw To DATANE MANAGEMENT SYSTEM 1 GUIDELINE 4 : The relation Schemas are designed in such a way that they can be joined with equality conditions on attributes that are either primary key or fungi key_ This guarantees that no spurious tuples will be generated. Matching attributes in relations that are not (foreign key, primary key) combinations must be avoided, because joining on such attributes may produce spurious tuples. 6.3 FUNCTIONAL DEPENDENCIES Functional dependencies are the result of interrelationship between attributes or in between tuples in any relation. Definition : In relation R, X and Y are the two subsets of the set of attributes, Y is said to be functionally dependent on X if a given value of X (all attributs in X) uniquely determines the value of Y (all attributes in Y). It is denoted by X —> Y(Y depends upon X). Determinant : Here X is known as determinant of functional dependency. Consider the example of Employee relation : Employee X (EID) : Y (Name, Salary) The determinant is EID
  24. 24. 0 Fully funcliarml dependency : Let A be the non- prime key attribute and value of A is is said to be fully functional dependent. g prime key attributes (Ro1lNo and Game) and non- dependent upon all prime key attributes. Then A Consider a relation student havin prime key attributes (Grade, Name and Fee). It represents fee of particular game FIGURE 6 3 Functional dependency chart ‘showing Partial and Fully fimctional dependency of student relation. As h . F. e 6 3 Name and Fee are partially dependent because you can find the s own in igur . I name of Student by his Ro1]No, and fee of any game by name of the game. find the grade of any student in Grade is fully functionally (“Pendent because you can P - 1 d d ' d a particular game if you know RollNo. and Game of that student. artia epen ency LS ue *0 more than one prime 1(9)’ attribute’ ‘ti D endency 2. Transitive DependeI| CY ‘“'d No" Trans‘ ve ep . . ' d t d enden between non’P'i"‘° 0 Transitive dependency : Transitive dependency IS ue 0 ep Cy . - R, —> Y (Y depends upon X), .Y-* Z (2 d°_P_°’'ds k9)’ 3"TibUt9$- 5‘1PP°se In acielaethiins upin X). Therefore, 2 is said to bé 'T3"5““’°‘-V upon Y), then X -’. Z . (z ep > dependent upon X- : ,-¢. _jer1 . - --q R]
  25. 25. imnooucnon 1'0 DATABASE MANAGEMENT SYSTEM 1 14 ’ N(m_-m, ,$, },'p¢- dependency : Any functional dependency which is not transitive is known as Non-Transitive depend€“CY- )Ion_-rn. u.lsifive dependency exists if there is no dependency between non-prime key attributes. consider a relation student (whose Functionatdependency chart is shown in Figure 6.4) having prime key attribute (RollNo) and non-prime key attributes (Name, Semester, I-Io5ue1)_ Dependency between non-prime key attributes FIGURE 6.4. Functional dependency chart showing transitive and non-transitive dependenq on relation student. For each semster there is different hostel Here Hostel is transitively dependent upon RollNo. Semester of any student can be find by his RollNo. Hostel can be find out by semester of student. Here, Name is non-transitively dependent upon RollNo. . Single Valued Dependency and Multivalued Dependency in Single valued dependency : In any relation R, if for a particular value of X, Y has single value then it is known as single valued dependency. 0 Multivnlued dependency (MVD) : In any relation R, if for a particular value of X, Y has more then one value, then it is kn own as multivalued dependency. It is denoted by X ->—> Y. Computer Computer Electronics Mechanical Mechanical % 1/, lb) FIGURE 6.5. Functional dependen gr chart showing single valued and multivalued dependency on relation teacher.
  26. 26. ? Fuacmuu Du-oooacvno Nolu«usAfl0"¢ "5 Til"“"'D“""'UI§uhaauiOmbux: ni-cAhndurr&ninlIuI¢! d—~ | &l""""“"D"*'*UuChnuulhpha~n-rachurmhcmuuueinmuudly. n"l'n'*"*""“""7t-4unm! I)udTmcfrrhr(: -ne¢dIhId'Th5‘ in-F’ gfiivdlhpafiurynlflco-Ittvullhpnluuy arm-HT‘ In-rvwi-nrmlI-vumnnrv; xi_Y-dunno-id! ) o. --1'-= -‘H’ In-nwu-nIL -vuunun-an Ilfibnilrnhauldll L'. vuuJnU'v*na; ~y-hnl‘ruhx1u-tserviuhrv-wimfgurhh d; -Mt ‘Q t (3:45. up gm i ; I -‘ (chat ; up : no i 3 Ivan i‘ to I so _; ‘ °- 1 tr ! is s . ~ 5 uni-«.5 ’ so I 99 5 Has 0 I I03 - I . .. 1 _—¢. .-u. ‘, .-_. ... .._. .._. .,. e.. J urgmcauuwnuucauun Mrn. GI, Ni vs-omnl In 50 ‘A016: ID 5 35min! I! I-4 A? Il. CIQjAY'fN.31’IIAQ Thain: -g§; g . n.. n.. .=. ..‘. u;udnlI. unhuu-d-lcuh-n- ‘¢un-thy: -unhvruunublhrul W tau-r than uurvuzas A--nu-. .! 'uIrnsh: -a no-5:0-to-nest! -I4-r-~J~! '-0'-*""“_"""'-"*"-1!". Cwunkw lhr n-uni-ilngve-. w uoan-a~in» Nutc an-10¢! -Va D~v-N-mr, - *%‘H| fl". flfF. . tin-unrunpuunub-nun-llslh-i (jg w * . ‘ 1. Dad ; 2:» ; o; i . ‘ y Iuoau 1 : v‘ ‘R’. . *1 till; I ‘ Auonans 3 '. .u---; « 9.41.: " : ':'£‘ l Si" v C sans-9 9w‘-U ‘ "'5 ’ . u‘"‘'"'‘ 1 ~ -- 9 S van: uni ‘no 5 M u‘'‘'‘'"‘ i j ‘ , E p. is can 2 . known I 2 7 ' uh he--I "flu S ’ “‘"""" ‘ 9 to , Dane-an 5 ' ; no 2 : Auto. -u 5g. ..‘ 5.. ; I I was ' , ..'. ".‘____. _.L . ... L._-__, .__. .._. .l " in-Id muggaj-d: t1 *'“'-"< - -
  27. 27. DATABASE MANAGEMENT SYSTEM cnoN T0 115 INTRODU . 5 41 insertion A"°'“"" want to add new information in any relation but cannot enter that da S;1l; §:: : ': .::5u-aints. This is known as Insertion anomaly. ‘ In relation Employee, you cannot odd new departinent Finance unless there is an employee in Finance department. Addition of : his information violates Entity Integrity Rule 1: (Primary Key cannot be NULL): In other words when you depend on any other information to add new information then it leads to insertion anomal)‘ . " 6.4.2 Deletion Anomal)’ The deletion anomaly occurs when you try to delete any existing information from any relation and this causes deletion of any other undesirable information. In relation Employee, if you try to delete tuple containg Deepak this leads to the deletion of department "Sales" completely (there is only one employee in sales department). 6.4.3 Updation Anomaly The updation anomaly occurs when you try to update any existing information in any relation and this causes inconsistency of data. ln relation Employee, if you change the Dept. No. of department Accounts. E'R We can update only one tuple at (1 time. This will cause inconsistency if you update Dept. No. of single employee only Otherwise you have to search all employees working in Accounts department and update them individually. 6.5 NORMALIZATION . :j—j— _Normalization is a process by which we can decompose or divide any relation into more than one relation to remove anomalies in relational databa_s§_. -‘: j"*? It is a ste b ste rocess and each step is known as Normal Form. Normalization is a reversible process. 6.5.1 Properties of Normalization 1. Remove different anomalies. 2. Decomposition must be lossless. 3. Preserve necessary dependency. 4. Reduce redundancy. 6.5.2.1 First Normal Form (INF) A relation is in first normal form if domain of each attribute contains only atomic V31“°5 It means atomicity must be present in relation.
  28. 28. Consider the relation FUNCTIONAL D an Em 1 enosncv mo NORMAUSATION 117 P Oyec as 5 use attribute Na - . hown in Fi - . beca me as not alomnc. so, div gure 6.7. It 15 not in first normal form Name as shown in Figure (, _3_ Employee . - Badminton Cricket Badminton Hockey ide it ' t . _ in 0 two attnbutes First Name and Last Dependency Chart (b) FIGURE 6.9. Student relation with anomalies. —— T ’— ' Ill"l"‘“nv'-— 4- -A A
  29. 29. SYSTEM DATABASE MANAGEMENT 1 1 8 INrRoDUC11°" 7° _ d i artici ate in more than The P . ,3, Key is (RollNo. . Game) 533‘ _5'“ en P P °“° Shrug - dent is in 1NF but still contains anomalies- Relatm? émmwmayy . Suppose you want to delete student Iack. Here you loose ' 1. Demo" ' about game Hockey because he is the only P13Y‘-9" Pamclpated 1“ h°°k9Y- 2_ Inqcrtzhrl anomaly : Suppose you want to add a new game Basket Ball having no stud“ participated in it. You cannot add this information unless there is a for it. _ 3_ updation anomaly : Suppose you want to change Fee of Cncket. Here, you have Search all the students participated in cricket and update fee individually or}, - it produces inconsistency. ' Pl3Yei to The solution of this problem is to separate Partial dependencies and Fully fiinctiomi dependencies. So, divide Student relation into three relations Student(Ro11No. . Name), Games (Game, Fee) and Performance(RollNo. , Game, Grade) as shown in Figure 6.10. Student Dheenq Lalit Parul jack John Badminton Hockey Cricket Badminton Cricket Badminton Hockey -$. .a~——_-_. .-
  30. 30. 119 Fu~c11o~Ai. DEPENDENCY AND NomAusA110N Gaurav Vishal Neha (bl FIGURE 6.11. Student relation. The Primary Key is (RollNo. ). The condition is different Hostel is allotted for different semester. Student relation is in 2NF but still contains anomalies. 1. Deletion anomaly : If you want to delete student Gaurav. You loose information about Hostel [-12 because he is the only student staying in hostel H2. 2. insertion anomaly : If you want to add a new Hostel H8 and this is not allotted to any student. You cannot add this information. 3. Updation anomaly : If you want to change hostel of all students of first semester. You have to search all the students of first semester and update them individually otherwise it causes inconsistency. The solution of this problem is to divide relation Student into two relations Student(RollNo. Name, Semester) and Hostels(Semester, Hostel) as shown in Figure 6.12. student Hostels FIGURE 6.12. Relations in 3NF. Now, deletion, insertion and updation operations can be performed without causing inf-‘onsistency.
  31. 31. 120 INTRDCIJCTION TO DATABASE MANAGEMENT SYSTEM 6.5.2.4 Boyce Codd Normal Form (BCNF) BCNF is a strict format of 3NF. A relation is in BCNF if and only if all detenninanis a, ,, candidate keys. BCNF deals with multiple candidate keys. Relations in 3NF also contains anomalies. Consider the relation Student as shown in Figure 6.l3(a). Student
  32. 32. ‘I21 ppose you want to change Teacher for Subject C. You have to FUNCTIONAL Depeuoencv AND NORMAUSATION 3' uPd0fion anomaly : Su Search all the students having subject C and update each record individually otherwise FIGURE 6.14. Deterrninants in relation student The solution of this problem is to divide relation Student in two relatiors Stu-Teac and Teac-Sub as shown in Figure 6.15. Teac-Sub Candidate Key (Teacher) ®lU3O-‘I-II-55°"-3" Stu - Teac (RollNo. , Teacher) Candidate Key (RollNo. , Teacher) FIGURE 6.15. Relations in BCNF. In this solution all determinants are candidate keys. 6.5.2.5 Fourth Normal Form («IND Ar-elationisin4NFifitisinBCNFandforallM° 6 (MVD) of the form X —)—> Y either _X —) Y is a qiya_l__M_/ D 01jiSJju - ‘ _ Relations in BCNF also contains anomalies. Consider the relation project-Work as shown in Figure 6.16.
  33. 33. 122 INTRODUCTION TO DATABASE MANAGEMENT SYSTEM Proj ect—Work FIGURE 6.16. Project-Work relation. Assumptions : /L A Progranuner can work on any number of projects. /’— A project can have more then one module. Relation Project-work is in BCNF but still contains anomalies. 1. Deletion anomaly : If you delete project 2. You will loose information about Programmer P3. 2. Insertion anomaly : If you want to add a new project 4. You cannot add this project until it is assigned to any programmer. 3. Updation anomaly : If you want to change name of project 1. Then you have to search _ all the programmers having project 1 and update them individually otherwise it causes inconsistency. Dependencies in Relation Project-work are Programmer -9-) Project Project —>—> Module The solution of this problem is to divide relation Project-Work into two relations Prog-Prj (Programmer. Project) and Prj-Module (Project, Module) as shown in Figure 6.17. / Pwi-Pd , /Prj-Module Here Programmer is the super key Here Project is the super key FIGURE 6.17. Relations are in 4NF. 6.5.2.6 Project loin Normal Form (SNF) and join Dependency .7017’ D"l7"”d""". V -' Let R be 3 SW9" relation upto 4NF and it decompose (Projected or divided) into [Ry 132' R3.. ., 1%. }. The relation R satisfy the join dependency " [R1, R2, R3, R"! if and only if joimng of R1 to R“ = R.
  34. 34. Consider the Relation - XYZ with ' 5 ' attributes X4! ( ranch__lD) as shown in Figure 6&8. (Custorner_lD), Y3 (Account_lD) and Z; First. decom _ xy - _ P0‘? Z mto three relations XY. Y2 and Zx_ ' (not‘norig'a| XYZ) " FIGURE 6.18. Relahbns XY, )2, D( are in 5NF and relations XYZ satisfi the join dependency.
  35. 35. 124 ; .r, —p_go. _c: c~. so D. -.r. :.'sass K1ANACEMENT SYSTEM When these three relations give original form of XYZ. So relation XYZ satisfy the join deperzdertq‘. Fro/ -2:2 [v: :r: .‘-'-rrr-.2.-' F: '.". ': (5.'F) : Let R is a relation and D is a set of all dependencies (t_. (ulr. i'. '.: lu~: -<1" depe: :denc§', Functional dependency, Join dependency etc. ) The relation R is in S. 'F WLI. D if for en"-: -r_x' loin De}: -endency, Join Dependency is trivial. 53'! ‘ is ti-2 ultimate . ‘cr: :~al A relation in 5 NF is guaranteed to be free of anomalies. Ccrwlder the exazrgle in Figure 6.18, where relations XY, YZ and 2X are in SNF. SOLVED PROBLEMS Problem 1. . 'o: ::-. alize the following relation EDP (ename, enum, address, dnum, drriago: -nu: '2. ci: .a: r.e, _: r.2:ne, pnum plocation), given functional dependencies enum -+ lename, address, cl: v.': .-1, : ':-_= ..': *.e, c': ::g. 'enurn), dnum -> ldname, dmagrenuml, pnum —> lpname, plocationl, the ; ~:i. ':. =_: j; key is {enum, dnum, pnum]. Discuss each step. Solution. Step L We ass~; :_e that it already in INF because INF deals with atomicity only. .': ru. ' d-3;= :.'*. rie1:}' chart of ED? is ‘; “"d°""" 55.95“-‘-"9"9' Chart fbr relation EDP, . ’-’: _.. .=""*: : is in '_7_'E-‘ ifit ' ' 1I _ - . ‘. dependéitm TF af‘d an “°“ Primary key attributes must be ‘ - - “P°“ Pnmary key attributes. 50.-.350? . -rcne cf1’henon— ' yke . _ ': :cr~. .. 3:; -T_-_-2:, am_: aute&PImm) y attributes 15 fully functional dependent . ;: r?5‘5'3 53? into three new relations . .__. T--_. _: --3; urn, ename, add; -955) 3- D€: ‘i. ".. ':‘~. =.>: ‘: . l»= —~-~ - - dname, dngrenum) —. p . . J-_ _b->-p-; —. ... ,._, -1.. . plocation) Problem 1 : -:'_= ‘;: Le. ' the :1: of-dept, - -. .e: ::2z-: -cl. dept-code, head-
  36. 36. FUNCTIONAL Deveuoencv mo NORMALISATION 125 You can find dept-code by knowing head-of-dept and vice-versa and percentage-time is the time ratio, which .1 prof—code spend for a particular department. Draw the functional dependency diagram for above schema. Solution. I Dent-code l 9; l 5;? EXERCISES 1. L’ Explain what is meant by normalization and its advantages. Explain and define INF, ZNF. BXF, BC. 'F and -[NF by giving suitable examples for each. Compare and contrast Full/ Partial/ Transitive dependencies. [J - T; .‘4.m4"r'-‘:5 'hat do you understand by functional dependency? Explain with examples. Explain l. 'F. '. ’.'F, 3NF. BCNF and 4NF with help of suitable examples. Explain multi—‘alued dependencies and Fourth Normal Form. Eplain Functional and transitive dependencies. " 9‘ 1-" t" S" . ". 'on-loss decomposition is an aid to relational databases". Is it tme? If yes, then justify it through an example. S. Explain join dependency and 5”‘ Normal Form. 9. What is nonnalization ? Explain successive normalization in designing a relational database by taking a suitable example. 10. Explain iein dependency and multivalued dependency. 11. What is normalization ? What is the need of normalization? What do you understand by loss less decomposition ? Define and discuss 3NF and BCNF using suitable examples. 12. Define nonnalization. Take a relation and normalize it upto 3NF. explaining each step. 13. Define BCNF. Using suitable example distinguish between 3NF and BCNF. which is better and whv .7 1-1. What are the problems of bad database design ? Write the procedure of normalizing database while discussing the various normal forms. 15. Define functional dependence and full functional dependence and then explain the concept of 2.'F also with example. 16. Define Functional Dependencies. Write Armstrong rule and show that other rules are derived from Armstrong rules. Also distinguish between full FD and partial FD. 17. State. by giving examples, the conditions that are necessary for a relation to be in [NE ZNF» 3'F, and BCNF. 18. What is functional Dependency? How does. it relate with multivalued dependency? Explain. 19. Why are certain functional dependencies called "trivial functional dependencies"? F~‘Pl3i“-
  37. 37. Liv. -u'.2 U: at S“. ’ zu , l'l. Yr. "‘I, ' inf“: IO r. ‘A‘h; “,A_: Ix! t. '.-r~-xi: -I 'li¢- ll . '.. m.i»_- t. ~i! ~!r- ~ M42.-. ..~_. . 1 hnxuszr 94.1 Cnntritt Na. NR1 f. i-nun: ,, _-. ,', _ - . ‘, T _. . u v - ; ‘, ‘'. , '- _~z 1» } ’ . l’ . -I . :‘ g: I ‘I y! ' O‘ A J_ | lg~. .“. —, l . -. r , ... . ' _, ,.. . - . - ~. s '; ‘, ? v': ‘.'1'. ,' “‘ "‘I"""" I 2‘ (_ , .. V. I . .. ‘_ ‘v_r , _. -‘~ . ': li'~- ‘ cu : -‘—. —-. ‘~»-. . ---7-- u . -— ». ,: —. l . -l , ~ '---. ~v~(‘~v'; *2 ' t A’ : - r. - .1 z . r -- . ,. .r . _v , - ~_. .., ,_. 1.. .‘- '
  38. 38. Chapter TRANSACTIONS AND CONCURRENCY CONTROL 8.1 INTRODUCTION Transaction is a logical unit of work that represents the real-world'events. A transaction is also defined as any one execution of a user program in a Database Management System (DBMS). The transaction management is the ability of a database management system to manage the different transactions that occur within it. A DBMS has to interleave the actions of many transactions due to performance reasons. The interleaving is done in such a way that the result of the concurrent execution is equivalent to some serial execution of the same set of transactions. Conmrrency control is the activity of coordinating the actions of transactions that operate, simultaneously or in parallel to access the shared data. How the DBMS handles concurrent executions is an important aspect of transaction management. Other related issues are how b the DBMS handles paflal transactions, or transactions that are interrupted efgre successful c_q; n_p1e. tion. The DBMS makes it sure that the modifications done by such partial transactions are not seen by other transactions. Thus, transaction-processing and concurrency control form important activities of any Database Management System (DBMS) and is the subject-matter of this chapter. 8.2 TRANSACTION A transaction can be defined as a unit or part of any program at the time of its execution. During transactions data items can be read or. updated or both. 186
  39. 39. TRANSACTTONS AND CONCURRENCY CONTROL 8.2.1 ACID Properties of Transaction The database system is required to maintain the following properties of transactions to ensure the integrity of the data. 1. Atomicity : A transaction must be atomic. Atomic transaction means either all operations within a transaction are completed or none. 2. Consistency : A transaction must be executed in isolation. It means variables used by a transaction cannot be changed by any other transaction concunently. Isolation : During concurrent transaction each transaction must be unaware of other transactions. For any transaction T, , it appears to T, that any other transaction T, is either finished before starting of T, . or started after T, - finished. Durability : Changes are made permanent to database after successful completion of transaction even in the case of system failure or crash. 8.2.2 Tmnsaction States A transaction must be in one of the following states as shown in Figure 8.1. Active state Database after completion of transaction FIGURE 8.1. States of the transaction. . Active state : It is the initial state of transaction. During execution of statements, a transaction is in active state. Partially committed : A transaction is in partially committed state, when all the statements within transaction are executed but transaction is not committed. 3. Failed : In any case, if transaction cannot be proceeded further then transaction is in failed state. 4. Committed : After successful completion of transaction, it is in committed state. 8.3 SOME DEFINITIONS 3.3.1 Serializability Consider a set of transactions (T1, T2 . .., T, -). S, is the st executed and successfully completed and S2 is the state of database after they are executed in any serial manner (one-by—one ) and successfully completed. If S, and S, are same then the database maintains serializability - ate of database after they are concurrently 8.3.2 Concurrent Execution at the same time then they are said to be executed concurrently.
  40. 40. 188 8.3.3 Recoverability gmnsaction has to be performed in me I undo effects 0 an)/ essfuuy then that database n-iaimaim To maintain atomicity of datab-15?! undo effects succ of failm 0‘ W“ *‘°“““°'i. ""' R llback. mcoverability. This process IS known as 0 8.3.4 Cascading Rollback _ T. and T, IS , -- T‘ dc ndcnt upon _ b’ecau<e T2 heads value of C which is updated by T, - failed due to any "3350" than mllback 8.2. Here T2 is dependent upon T‘ FIGURE 8.2. Cascading rollback. if T, fails then rollback T, and also T2. 8.4 WHY CONCURRENCY CONTROL IS NEEDED? To make system efficient and save time, it is required to execute more than one transaction at the same time. But concurrency leads several problems. The three problems associated with concurrency are as follows : ‘ 8.4.1 The Lost Update Problem If any transaction updates U at tirne t -th . . I th this 1 as ‘ W1 out knowing the value of v at ume 9“ ma)’ 93 *0 10st update problem. Consider the transactions shown in Figure 33»
  41. 41. TRANSACTTONS mo Comzunacncv CONTROL 189 Al lime 1; and lg. transactions I, and ti reads variable v respectively and get some value of "- At time I3, 1‘, updates v but, at t‘, ti also updates 17 (old value of 0) without looking the new value of I: (updated by t, ). So, updation made by t, » at 13 is lost at t, , bemuse Ti overwrites it. 8.4.2 The Uncommitted Dependency Problem This problem arises if any transaction Ti updates any variable v and allows retrieval or updation of v by any other transaction but '1'}. rolled back due to failure. Consider the transactions shown in Figure 8.4. Rollback FIGURE 8.4. The unmmmited dependency problem. At time fl, transaction T’, updates variable v which is read or updated by Ti at time (2. Suppose at time I3, T, - is rollbacked due to any reason‘ then result produced by T, is wrong because it is based on false assumption. ' 8.4.3 The Inconsistent Analysis Problem Consider the transactions shown in Figure 8.5. read(a) a - 10 read(b) b - 20 write(a) a - 50 Commit Add a, b a+b-30,not60 FIGURE 8.5. The inwnsistent analysis pmblgm
  42. 42. DATABASE MANAGEMBNN SYSTEM ’ _ d b at time t, and t2 respectively. But at . T 0‘ lgagfle anndn commits at !4 that makes Changes Pennhme ll addition of a and b at time 15 gives wrong 1'95"“ This leads Inconsistency m database of inconsistent analysis bl’ Ti‘ UES 8.5 CONCURRENCY CONTROL TECHNIQ To avoid conCU1'l'9"CY related Problems and to maintain Consistency’ some rules . . (P l need to be made 50 lillal system can be Protected from such Sltuahons and efficiency. 190 lNl'RODUCTlON ‘'0 The transaction T. "ads 8.5.1 Lock-Based Protocols To maintain consistency, restrict modification of the data item by any other transaction W ' ; is presently accessed by some transaction. These restrictions can be applied by applying lhlth on data items. To access any data item, transaction have to obtain l od“ 8.5.1.1 Types of Lock There are two types of locks that can be im ock on it. lock on data item A then T, can ' - r, . is said to be in Shared lock mode_ ' *5 d°“°*°d by S and
  43. 43. 191 T2.1.S#sT‘L‘: S ac C: ~:. '~‘F: ‘—‘~= l‘ CC-"~"‘a’3— A U'3"5a"fi°“ Ti '33“ h°‘d ‘ml? tho& itexs wI'; ‘s: :'—t 2:: L. ‘ ‘ :1 3°. if :2)“ d: -.: a it-3:1 V is 1'30? free then T: is made to wait until v is mam 2: s Th": ;<. ~'—: .-2: 2:3 '. .:: -_’: t.'k data item immediately after its final access. Example : Consider the banking ; n -. »'_-_-. ;'-~_ : .“'~ j~_. -e . account A to account B. Transaction T. in F: -:. ;.-v <" _: §c'. .1: - 55-: *5 —_. l:r. :_-.2 2.-1:} T- shows sum of accounts A and B. Initially gcuw --1: A {-_; _< '. .‘I_'I. ‘ - : -.£ '5 2:. ’ Rs 333/-. X ~ l<. ‘ck(»l; rcad(A); A " A - S00: -. - Write(A): 5-’: -.<S : Unlc-ck(Al: F-22: S - X-lock(B): *. * E read(B); éséuk - S: B - B * SCO; Write(B); un| ock(B): FIGURE 8.7. Tr: 'rs: c:‘: 'o's 7'. .'-z. ‘ '_. ‘Sr : :.. -'.1:. --'_: :: x.. "~3': . The possible schedule for T1 and 1'; is sZ: ~:'-~'. : ‘: 'E‘~g': :~. > Si. HGURE 8.3. fi'€dU/ Z E? L”. ’P$. 'Cl. 'r' T. .'. "I. ' 7‘. :9 ;3> i_~. _-g_~_-_-' _-j__-; _- g___a-, -5’, -:1 um'oc'z; -J an-ve-_". -': :2:} : 'T_-. -- 3 -’. »-. _-, v‘ 32;
  44. 44. es wrong result i. e., Rs.800/- ins .8, T2 8iv ‘Y after its final access. ed immedlaffl [E of [.0Cking _ T d ' ' ed transaction an T 85-1-4 ‘“‘‘’“’‘'° ,1 the end of tra - ' . The mod! “ ' zareshown Keep all l0ClG llntl lead of naumz 3,9. Improved structure of lacking an transactions. The transaction T2 cannot obtain lock on A until A is released by T2 but at that time changes are made in both accounts. Advantages 1. In maintains serializability. 2. It solves all the concurrency problems. Disadvantages 1. It leads to a new problem that is Deadlock. _ TWO P°5Sib1e schedules 5, and S2 for transactions T2 and T2 of Figure 8.9 are shown in Figure 8.10(a, b). g protocol guamt ' - -- . . . lock and unlock requests in two Pha: :Ss'sm“I’z”b”’t. ‘/- In this protocol, every transaction issue (1') Growing phase : In lock. (17) Shrinking or contra obtain new locks. Lock Point : In gr is called Lock Point. this Phase. transaction can obtain new locks but cannot release an)’ ding phase " I" this Phase, transaction can release locks but WWI oints. Transactions T gdprot-om]: . can be ord med ' their 10d‘ P i an T2 in Figure 39 follows e according to
  45. 45. X~Iocl<(A) read(A) A - A - 500 Write(A) X-| ock(B) read(B) B - B + 509 write(B) Un| ock(A) Un| ock(B) Not-possible because 8 is locked in shared mode by T2 and T, have to wait for B. (b) read(A) S-lock(B) read(B) disp| ay(A + B) un| ock(A) un| ock(B) Not-possible because A is locked exclusively by T, and T2 have to wait for A. A-500 B-800 A+B-1300 nouns 3.10. sdmruies s, and $2 for transactions T, and T2.
  46. 46. 1. It may ca 2. It may cause cascading mu ing Protaco f transaction or I : In strict two phase locking protocol, an exdusiv e ' I Lock _ 2' Sm“ Two P my until the transaction commits, locks are kept until the end 0 Advantages 1. There is no cascading rollback in strict two phase locking protocol, Example. Consider partial schedules S1 (in basic two-phase locking) and S2 (in strict two phase locking) as shown in Figure 8.10. In 5, there is cascading rollback of T4 are kept till end and avoid cascading roll 3. Rigorous 'I‘wo-phase Locking Protocol : In rigorous two phase locking mtoc] locks are kept until the transaction commits. It means both shared and exclusive licks co‘ an annot be released till the end of transaction. due to rollback of T3 but in S2, all exclusive lock back as shown in Figure 8.11(n, b). S-| ock(A) X-| ock(B) '°“d(A) rcad( B) x_'°ck(B) write(B) "°“d(Bl X-lock(C) unlock(B) read(C) write(B) X-| oc| <(C) read(C) write(C) Commit un| ock(3) unlock(A) unlock(c) write(C)
  47. 47. 7 TRANSAC1.‘ 8.5.1.6 Lock Conversion CNS AND C°"CURRB~I0r CONTROL 195 To improve efficiency of two. PhaSe lockin (i) Upgrade : Coversion of Shared mode tgoprotlocol, modes of lock can be converted, (in Downgrade : Conversion of exclusive m0:: ctus1ve mode is known as upgrade, Consider transaction T5 as shown in Figure 8120 shared mode is know“ as d°w“E‘"3d9- In T5, an exclusive lock of A is nee ded after somefime so firs t obtain shared lock on A and then upgrade it. During this e -od A and hence increase efficiency_ P 11 an)’ other transaction can also obtain shared lock-_ of FIGURE 8.12. Lock mmcrsion (Upgrade). 8.5.2 Graph Based Protocols otocols, an additional informa ess data items of database. These protocol protocols are not applicable but they required additional information that is not two-phase protocols. ln graph based protocols partial ordering is used. Consider, a set of data items D = [d, , dz, d, , d, , . ..d: l. if d‘ ——) til. then, if any transaction T want to access both d, and d, then T have to access d, first then dl. Here, tree protocol is used in which D '5 viewed as a directed acyclic graph known as database graph. Rules for tree protocol : 1. Any transaction T can 2. After first lock if T wants t parent. 3. T cannot release all 4. Suppose T has obtain T cannot obtain any All the schedules under and recoverability are maintain efficiency- In graph based pr tion is needed for each transaction to know, how they will acc required in lock any data item at first time. o lock any other data item v then, first, T have to lock its exclusive locks before commit or abort. ed lock on data item 11 and then release it after sometime. Then lock on :2 again. graph based protocols maintain serializabflj ed by rule (3) but this reduces concurrency ty. Cascadelessness and hence reduce s are used where two-phase .
  48. 48. ? 2 Multiversion Two-phase Locking tiversion two-phase lockin - 8. adaantages of both ‘ ' d two_phaSe loch“ tech 7 . multiversion concurncncy co f mumversio _8 mque are combined. It is also helpful to overcome the n timestamp ordenng. mp is given to even, , er$- , . ion of each data item. Here timestamp counter never a transaction TRANSACTIO us AND CONCURRENCY CONTROL 201 8.5.5. In mul techniques an disadvantages o Single Tiinesta (‘[‘S—counter) is used instead of system clock and logical counter. whe commits, TS—counter is incremented by 1_ Read only transactions are based upon multiv hmestamp equal to the value of TS-counter. two-phase locking protocol. ntrol ersion timestamp ordering. These transactions are associated with Updations are based upon rigrous 8.6 DEADLOCKS s {T , T, ,.. ., T, ,| such A system is said to be in deadlock state if there exist a set of transaction that each transaction in set is waiting for releasing any resource by any other transaction in that set. In this case, all the transactions are in waiting state. Deadlock es all of the follow tion holding at least 0 held by other processes. ble lock on any data 8.6.1 Necessary Conditions For m is in deadlock state if it satisfi it : Whenever a transac another resources that are currently being I Exclusion 2 When any transaction has obtained a nonshara transaction r ' (iii) No Preemption : When a transaction holds , of waiting processes (iv) Circular Wait : Let T be the set held by T1, Tl is wai is waiting for a resource that is by T2, T, is waiting for a resource that is held by TD. ing conditions. A syste ne resource and waiting for (1') Hold and Iva T, -I such that Tn T = {T0, T1, 5 held ting for a resource that i g Deadlocks 8.6.2 Methods For Handlin ndling deadlocks. These are : are Two methods for ha There enfion 8.6.2.1 Deadlock Prev it is better to prevent the _ . recover it. Different approaches for preventing 1. Advance Locking : it is the simplest technique in w required data items at the beginning of its execution in atomic manner. It locked in one step or none is locked. Disadvantages 1. it is very hard to pre 2. Under utilization is high. 3. Reduces concurrency. m deadlcok condition then to detect it and then deadlocks are : hich each transaction locks all the means all items are dict all data items required by transaction at starting. 2. Ordering Data Items : In this approach, all the data items are assigned 1,2,3,. .. etc. Any transaction T, can acquire l0Cl6 in a particular order, such as i order or descending order. Disadvantages : It reduces concurrency but less then advance locking. T " ‘*-~ uu oer ~ — _ — ---~r uansacfi ' ' ""’ I added in cost factor in . ..4.? .:‘, as . ,.. V
  49. 49. Si$TEM 202 lmaooucnow T0 Danansa MANAGEMENT _ c Two Phase Locking " 1“ “"5 appwa to increase concurrenC_V'- Two diffe h, two-phase locking 3. Ordering Data Items with with ordering data items is used mm techniques to remove deadlock mestamps : There are -1. Techniques based on 7? based on time stamps. _ (i) Wait-die : ln wait-die technique if any . held by 1‘, then T; have to wait on1}' ‘ is rolled back. ! lf TS(T, ')<TS(T, ') T, waits; eeds any mgoume that is presently f ‘t h s timestamp smaller than T, otherwise Ti 1 3 ‘ else T‘ is rolled back; It is a nonpreemptive technique. (ii) Wound-wait : In wound—wait technique, . is presently held by T’ (hen T’. have to wait on otherwise T, is rolled back. If T$(T, )>T5(T, l T‘ waits; if anv transaction T, needs an_v resource that ' lv if it has timestamp larger then T’ else '1', is rolled back; It is a preemptive h'chm'que. Disadvantage : There are unnecessary roll backs in both techniques that leads to starvation. 8.6.2.2 Deadlock Detection and Recovery ln this approach, allow system to enter in deadlock state then detect it and recover it. To employ this approach, system must have: 1. Mechanism to maintain infonnation of current allocation of data items to transactions and the order in which they are allocated. 2. An algorithm which is invoked periodically b_v system to detennine deadlocks in svstem. 3. An algorithm to recover from deadlock state. Deadlock Detection : To detect deadlock. maintain wait-for graph . _ 1' wmbfor Gmp’: : t Consists of 3 P3” G = (V. E). where V is the set of vertices and E is the set of edges. ert1ces show transactions and edges show dependency of transactions T T. x,1requels; t ‘awdata item which is currently being held by T, then, an edge Deadlock occurs wlh grill? . - en [Tl {ms olitamed the required item {mm Tl "~"“°"'3 ‘he edge- 9“ 9“? 15 cyc e In wait-for graph. Consider wait-for graph as shown in l-"igure 8.17. G 0 6 FIGURE 8.17. Wait-for graph wighou‘ demflod‘ 5’-“Mao”
  50. 50. Tnmsacnous mo Concunnmcv CONTROL 203 There are three transactions T, , T2 and T3. The transaction T, is waiting for any data item held by T2 and T2 is waiting for any data item held by T3. But there is no deadlock because there is no cycle in wait-for graph. Consider the graph as shown in Figure 8.18, where T3 is also waiting for any data item held by Tl. ' FIGURE 8.18. Wait-fi)r graph with deadlock Thus, there exists a cycle in wait-for graph. T, —> T2 —> T3 —> T1 which results system in deadlock state and T1, T2 and T3 are all blocked. Now a major question arises : How much will be the time interval after which system invokes deadlock detection algorithm? It depends upon Two factors 2 1. After how much time a deadlock occurs. 2. How many transactions are deadlocked. 2. Recovery from Deadlock : When deadlock detection algorithm detects any deadlock in system, system must take necessary actions to recover from deadlock. Steps of recovery algorithm are : 1. Select a victim : To recover from deadlock, break the cycle in wait-for graph. To break this cycle, rollback any of the deadlocked transaction. The transaction which is rolled back is known as victim. Various factors to determine victim are : (i) Cost of transaction. (ii) How many more data items it required to complete its task. (iii) How many more transactions affected by its rollback. (iv) How much the transaction is completed and left. 2. Rollback : After selection of victim, it is rollbacked. There are Two types of rollback to break the cycle : (0 Total rollback : _ This is simple technique. In total rollback, the victim is rolled back completely. (ii) Partial rollback : In this approach, rollback the transaction upto that point which is necessary to break deadlock. Here transaction is partially rolled back. It Is an effective technique but system have to maintain additional information about the state of all running transactions. 3. Prevent Starvation : Sometimes, a particular transaction is rolledback several 51.335 and never completes which causes starvation of that transaction. To prevent starvation, set the limit of rollbacks or the maximum number, the same transaction chooses as victim. Also, the number of rollback can be added in cost factor to reduce starvation
  51. 51. we 204 l. 'T1-lODUC‘l1ON TO U mes . . . . . 8.6.2.3 ‘Fmeoui-based 5‘hc. b tween deadlock P“-“'°‘“‘°" and. d‘a: “od detech°“ and ‘ exists in L‘ _ g _t .1 fixed time or any peso , _ “mm”! -baled Schcmele transaction wall: f°r at . mOg h-1 it is mid to be tim “me. E. Ycover-"‘ In this “heme, H t re¢‘.0Ul'CL‘ in that fixed time t 0.1 - . eoui and cti "s not able to gt‘ ~ ‘ , ,- . _ . transa ‘on l<‘d nd restarted fill‘-V "‘m“'“mL . I mmgmon ls rollbac k a 3 nsiction-4 are short and long waits cause deadlocks _ . '. , i ll whi. ’l‘L‘ ra ‘ . ‘ This approach is pfLfLfl’Lt ' ‘ l ate time for link out. i icfll “‘ cl‘°°-"3 ‘lppmpri o short then it cause . . . .‘ to , . ws and it it is [is uniieccssa -. It is typ if it is too long then it refill Ufln£*Ct'5.<.1r_‘ rollbacks. EXERCISES ‘iated within DBMS. 5 the various problems assoc ry del 1. 'hat is concurrency ? Discus . . - . v cmt Discuss the locking techniques for concurrenc) L wing concurrency control techniques W ml with L‘dl‘| plL‘S. ith exaniples: [4 3. Discuss the follo (:1) Locking techniques lb) Techniques based on time stamp ordering (c) . lulti'ersian concurrency control techniques. 4. What do you understand by concurrency in database methods in databases with examples. Explain the multiversion concurrency con :2 ? Discuss the various conuiriencv control trol techniques. ble problems associated with it in DBMS. Discuss examples. Also discuss, what do you fall 6. Discuss the concurrency and the various possi various concurrency control techniques through suitable understand by serializability? 7. What are distributed locking methods ? When and how are they and disadvantages of distributed locking methods. ' 8. Explain the concept of concurrency and concurrencv control in database. used ? Also discuss advantages 9. E l ' t ‘ . - . - . . xp am he vanous concurrency control techniques without locking with examples. mits or 10. Dur'n ‘ts ' ' . _ , _ _ 1 g i execution a transaction passes through several states, until it finally com n why abort . l. ’ l: ' . . S '5 3“ P_‘’_55‘ble 5"‘l”°“C9 Of States through which a transaction may pass. Explai each state transition may occur? ' 11. What is a transaction 7 What are t ' - he ro erties of ‘ - - - red when a System failure Occurs? P P a transaction ? How is transaction recow 12. Wh t d . a 0 you mean by concurrency? Explain how the concurrenc‘. Problems can be golwd with the use of locks_ 13. What robl= ' . P uns Wlll occur due to concurrent execution of two transactions? What are the" solutions. 14. What benefits does strict two - phase lock - . 15. What are ACID properties? Explain IDS Provide? What dlsadvantages result? 16. What do you understand by serializability? Explain_ 17. What do you mean by concurrent-access I anomo ies in datab 7 18. What are concurrenc ases ' y problems? How are the ~ y solved ?
  52. 52. 19. 205 TRANSACTIONS AND Co~icu. iRr~t'v Cor-‘H101 What is time stamping > no“. . - can it . . Nip uf _m . ,_, n.p| L._ N “‘‘‘‘l '0!’ Clmcurrency contml? l)«"i('fll‘N: this with the 20. An | ‘“‘l""-‘ l‘<‘0l«s i. ~suc. s/retiini -wsi. » . . ‘m "1 -1 library allows . fI(| 'i meinhers to reserve books 28. 29. on-line A nit-mlxvr may also mm , “ dm "1. _ _ I . also uwd by the hbrtumm m “mi ‘NW lznupnhscriatioii rpiiiiovat, if so cl. --. ire. l This systern is “-rm. nu. P_. ‘.ud, “,dt. _. M H“, n‘_n_“M_v 'm; “:"_ mks art. uiitsianiling for l’tnI'I. ' than . i month. l>I‘blt. ‘ll1.‘ that may occur due to cuncurrrntlt W“ “cm” ll" lh“ ‘l""*‘"‘ -‘l”""'1‘ Wm‘ -17" “W ln| .k. m§_ ""““‘l'“"'-* as above ‘ How can you solve these E ‘l. lll the ounce ts is - . . . _ Delscribe the mlerof tifi-: ::; s‘| ,:: :TTt: E:"X Wllh an example. How is it N, 'lJll‘d to scrialilabrlity ’ ‘ R In the pmcess of deaillock detection and avoidance . -n airline re. ser'ation allows min- ‘ _ “ ‘ _ __ _ . ‘ “"‘l“| '“l‘r*. “~ to book tickets simiiltaneouslv. What are the [s. !:~l cinuirn. nt_ nlated pmi~. ]m“ H, “ , ‘ _ - . ~. -. 3 . . ' ‘ M" m‘‘. ‘' "“C0Unlvr. in case no concurrency control mech. iiii. sm is mplace. -hat is the coiumm Pm K“ b h ' ~ c_ - -. . Nmcd problem “S “bo_“. , Pk _v OU to 0LfCOITl t 1 concurrency Corn ‘are and cimtrrsl th - f- - - _. . . V l _. _ . ‘ _'~ ~“1lI‘. ~ of simple locking. intention mode locking and time stamping m°dl‘"“’m“ {mm th'“ "“-""l»“‘l'“ Of transiction control undir C0flCl| ' nt t ‘t' “ ' I Tl‘ I’-lnI I0n‘~ Dl~“C“-‘-" "“""l“ll°" ~‘“d "W0llnd-wait" 0PProaches of deadlock J. (ld. l!‘IL‘l. Compare th so. . . ' I. Q » approaches of deadlock avoidance with a deadlock aviodance approach in which data items are locked in a particular order (according to their rank) '5up_L‘05c‘ _'0U «Irv working in a bank and have been assigned as a transaction programmer, ‘ . : S " ' - - hat measure ill _ou-take such that your transaction programs do not cause any concumzncy related problem 7 Explain with help of example. What is deadlock in the contest of concurrent transaction execution? When does it occur? How is it detected in centralized database system? How can it be avoided? A company has increased the basic salary of its employees by Rs. 2000. The salary was further iving an additional increment of 10%. The company also has a concurrent luates average salary of its employees. Write the necessary transactions, upgraded by g Its. Use appropriate schemes to avoid transaction, which eva which can run concurrently to give appropriate resu undesirable results. he following concurrent transactions: ys 500 shares from company Y makin f Rs. 6000 to companies X and Y. ransaction is to be calculated. ons that can run concurrently. U Assume t — Company X bu — Company Z provides loan 0 — The monthly average of financial t ode programs of the transacti g a financial transaction of Rs 500. se appropriate Write the pseudoc schemes to avoid undesirable results. Assume the following concurrent banking transactions. ‘O0 " be I — Transfer of I6. 2000 from Ac 230 to Account num r — Vlfithdrawal by account 500, an “m0 — Finding out the average balance for WW’ Write the necessary transaction pseudocode P'° appropriate schemes to avoid undesirable resullS- count number unt of Rs. 5000- us accounts of the bank. . . Use gram, which can run concurrently’
  53. 53. Chapter DATABASE RECOVERY SYSTEM 10.1 INTRODUCTION A Computer system can be failed due to variety of reasons like disk crash, power fluctu2:': .—-, software error, sabotage and fire or flood at computer site. These failures result in the lost of information. Thus, database must be ca able of handlin such situations to ensure that 5-2 atomicity and durab pro system is the recovery mmmger that is responsible for recovering the data. It ensures atomicity by undoing the actions of transactions that do not commit and durability by making sure all actions of committed transactions survive system crashes and media failures. The reco'-'e: j.' manager deal with a wide variety of database states because it is called during svstem failura Furthermore, the database recover is the rocess of restorin the database to a consisterz 'e°°"9". V teclfnicllles Used 50 that the database system be restored to the most recent consistent state that existed shortly before the time of system faflm-e_ 10.2 CLASSIFICATION OF FAILURES 1. System crash : This fa'l h failure etc. lure appens due to the bugs in 5°ft“'aTe 01' by hardware 3. Disk failure : This failure ha (1 . transfer of data etc. Ppens ue to head Crash’ tea"-“S °f “P95: failure dunng 212
  54. 54. DATABASE Rscovsav SYSTEM 213 10.3 RECOVERY CONCEPT . Re-co'. 'e'r'. ' irrxrr. failure stat- refers to the method b which 5 stem restore its most recent cmfigsggao cgg*(_- , ~~. + before the time of failure. There are several methods by which you can , . , -.- - . recover dataoase . rrz:1 tax. ure state. ese are defined as follows : 10.3.1 Log Based Recovery In log based recr: -.'er~. ' svstem, a log is maintained, in which all the modifications of the . -‘-. log consists of log _r_ecords. . For each activity of database, separate log record is made, 11;’; records are maintained in a serial manner in which different activities are tfiap§>er. ed. Tnere are various log records. A typical update log record must contain follouirig re s: Ii) Tr; r:; .2:: :’. :n r. {err. ‘:fi»: r : A unique number given to each transaction. (ii) DJ:1-ztzm identifier : A unique number given to data item written. (iii; D: :e : r.: i time of updation. (iv; O: ':’ 2-2322 : ‘. ’al1.: e 0:’ data item before write. . 'a»'; .‘. ’e: .‘ : ‘:. 'ue : ‘Jalue of data item after write. l_. o2s must be on the non-volat: i.le_(stab1e) storage. In log-based recovery. the follow-. 'ir~. g two cperatzcns for recovery are re uired 2 (I) Had: . It rr. ear. s, the work of the transactions that completed successfully before crash is to be again. (ii; Univ: . It means. all the work done by the transactions that did not complete due to crash is to be undone. The redo and undo operations must be idempotent. An idempotent operation is that 0"‘-' or mm times- For any transaction T. various log records are : [T. star1l . It records to log when T, starts execution. IT, A] : lt records to log when T, reads data item A’. IT‘, A. V, V: ] ; lt records to log when T, updates data item A], where / I refers to old value and V2 refers to new value of [T Commit] 2 It records to log when T, successfully commits. [T_ abo. -Ls] : It records to log if T, aborts. There are two types of L9; Based Recovery teclmiques and are discussed below : I‘E'! l‘0.3.1.1 Recovery Based on Deferred Database Modification In deferred database modification technique, deferred (stops) all the write operations of any Transaction Tl until it partially commits. It means modify real database after T, partially commits. All the activities are recorded in log. Log records are used to modifv actual database. Suppose a rransaction T; wants to write on data item A], then a log record [T‘, A’, V1, V2] is sated In log and it is used to modify database. After actual modification T. enters in committed state. In this tzchrzxaue. the old : ~.z1ue field is not needed. '
  55. 55. 214 lnmooucnou TO Dxruuss MANAGEMENT SYSTEM Consider the example of Banking system. Suppose you want to transfer Kggoo from . Account A to B in Transaction '_l', and deposit 16.200 to Account C in T2. The transacfiom T 1. A and T, are shown in Figure '10. FIGURE 10.1. Transactions T, and T, Suppose, the initial values of A. B and C Accounts are Rs. 500, Rs. 1,000 and R5_ em respectively, Various log records for T, and T2 are as shown in Figure 10.2. [T, start] [T1-Al [T, , A, 300] [T1 . 91 rr, . 3, 1200] [T1 ccmmtz] (T, nun] IT. » C] [T_, , C, 500] [TI commxt] . ..___. ~.-. ._. ... ..-. ..__. ... ... ... ._. ... ... .. . ... ,~. ... _-. .._. _. _“ . . l FIGURE 10.2. Lt-_g n-; t~rJs ft‘! !! .:~1.. .'¢r. '.v-3 T, JPJ T‘, For a redo operation. log must contain [T, start] and IT, commit] log records. [T, sun] IT‘. A] | T,, A, 3 (3) 00] [T, , A] 1 [T, , A, 300] ' 17.. B1 I [T, , B, 1200] [T, commit] [T2 start] [T; -- C) [T2, C, 800] (D) “CURE ‘°-3- L08 Of mmsactions T, and T2 in case of crash. -
  56. 56. DMAIIME RECOVERY SYSTEM 21 5 Crash will happen at any time of execution of transactions. Suppose crash happened (1') After write (A) of T, : At that time log records in log are shown in Figure 10.3(a). There is no need to redo operation because no commit record appears in the log. Log records of T, can be deleted. (ii) After write (C) of T2: At that time log records in log are shown in Figure 10.3(b). In this situation, you have to redo T, because both IT, start] and [T, commit] appears in log. After redo operation, value of A and B are 300 and 1200 respectively. Values remain same because redo is idempotent. (iii) During recovery : If system is crashed at the time of recovery, simply starts the recovery again. 10.3.1.2 Recovery Based on Immediate Database Modification in immediate database modification technique, database is modified by any transaction T, - during its active state. It means, real database is modified ‘ust after the write 0 eration but after log record is written to stable storage. This is because log records are used during recovery. Use both Undoflandkedo operations in this method. Old value field is also needed (for undo opera ion . onsi er again the banking transaction of Figure 10.1. Corresponding log records after successful completion of T, and T2 are shown in Figure 10.4. IT. » (7,, A. 500. 300] IT. . 8] [T, , 3, 1000, 1200] [T, , commit] [T3, start] ITz- Cl [T_, , C, 600, 800] [T, commit] FIGURE 10.4. Log records fivr transactions T, and T, G For a transaction T, to be redone, log must contain both [T, start] and [T, commit] records. 6 For a transaction T, to be undone, log must contain only [T, start] record/ . [T, start] [T, start] IT. » AI FR. AI FR. A. 500. 300] [T, , A, 500, 300] IT. » 3] [T, , B, 1000,1200] [T, commit] [T, start] ITz- CI [T2, c, soo,3oo) la) (b) FIGURE 10.5. Log of transactions T, and T_, In case of crash.
  57. 57. -. o_ J. 216 / MnooucrioN T0 Dnmsnse MANAGEMENT SYSTEM Cnsh Wm happen at any time of execution of transaction. Suppose crash h.1ppcn, _.d_ m After write (A) of T1 : At that time log records in log are shown in Figure 1o_5(n)_ Here only [T, start] exists so undo transaction '1', As a result, Account A restores its old value 500. . 1-, ) After write (C) of T2 : At that time log records in log are shown in Figure l0.5(b)_ During back-up record [T2 start] appears but there is no [T2 commit], so undo transaction T2. As a result, Account C ‘restores its old value 600. When you found both [T, start] and [T, commit] records in log, redo transaction T‘ and account A and B both keep their new value. (iii) During recovery : If system is crashed at the time of recovery simply starts recovery ( again. 10.3.1.3 Checkpoints Both the techniques discussed earlier ensures recovery from failure state, but they have some disadvantages such as : ,(i) They are time consuming because successfully completed transactions have to be iedorre_ (ii) Searching procedure is also time consuming because the whole log has to be searched, So, use checkpoints to reduce overhead. Any of previous recovery techniques can be used with checkpgints. All__the tr_; ms_actions completed successfully or having [Tl commit] record _. _»- before [checkpoint] record need not to be redone During the time of failure, search the most recent _clii. -ckpoint. All the trai_i. :ac_tiogns5o_n_1p: leted s_q; gc&‘. sf_ul_ly after checkpoint need to be redone After searching checkpoint. search the most recent transaction T‘ that started execution before that checkpoint but not completed. After searching that transaction, redo/ undo transaction as required in applied method. Aalvmitnges 1. Niieed to redo successfully completed transactions before most recent checkpoints. 2. Less searching required. 3. Old records can be deleted. 1 0.4 SHADOW PAGING ; 5_l1-ldow Fagin is an alternative techni ue for recovery to overcome the disadvantages of l°8' based recovery techniques. The main idea behind the shadow paging is to keep M0 Page mbl‘-'5 l“ database. one is used for current ogemtions and other is used in case of n'coveg/ - Database is partitioned into fixed length blocks called pages. For memory management’ °d°Pt any Paging technique. Page Table To keep record of each page in database, maintain a page table. T P3 L9_9° '“"“b°T Of pages in database. Each entry in page table contaifls 3 pointer to the physical locatio§f pages_ "" m t¢‘V i. .= 1‘ s' J «-. .~
  58. 58. D/ mxaase Recovenv Svsrm 21 7 The two page tables in Shadow Paging are : H) Shadow page table : This table cannot be changed during any transaction. (41) Cum. -n! page table : This table may be changed during transaction. Both pa c tables are identical at the start of transaction. Suppose a transaction T, performs a write operation on data item V, fit residésuin page 1'. The write operation procedure is as follows : 1. If ; '" page is in main memory then OK otherwise first transfer it from secondary memory to main memory by instruction input (V). 2. Suppose page is used first time then : (a) System first finds a free page on disk and delete its entry from free page list. (b) Then modify the current page table so that it points to the page found in step 2(a). 3. Modify the contents of page or assign new value to V. Page 5 (Old) 2 2 3 3 4 4 5 5 5 Page 5 (New) 6 7 7 8 8 Current Page Table Shadow Page Table FIGURE 1(.6. Shadow paging.
  59. 59. MANAGEMENT . To DATABASE ’ . _ rations the following 5“’P5 are needed to c°"‘"“f : . . to disk. from main m°"‘°ry 213 lr~rrRoouCT70” rmins 3" ‘he "b°Ve ope f After Per 0 P3565 are transferred 1. Modified 2. Save current Page table on disk. dow page table (b, V ‘3ha“3i"8 the Physical add ' : .ha . . T 3‘ Change current Page i: bletcl>Ir1e: in non-volatile or stable storage b"C3"5° ” '5 Used "S , t e S Shado“, page fflblt. mus Crashed and you have (0 l’*"-‘C°‘’‘3’'' “- -“-her . , suppose the : =,V5t"m ‘ ' the time of f8C°"'~”. " . - . {fed into shadow age tab! . t . a 9 table 15_E9l‘_‘. °___- —r————L. . 5““"'55{"] com lefion of transaction’ ChndenFor mmhe search sii11Pl , store shad s adow P389 ‘ab’? h"5 mtge ifetarc : NO redo operations need to be invoked. S . . . ' t: T s orag . A table on fixed location 0 bl d C rrent Page table are shown in Figure 106 after write operation on Pas’ . . page ta e an U Disadvantages _ 1 Data fregmentation due to fixed sized [M395- 2. Wastage of memory 5P3C9- 3 E head at the time of commit (bv transferring of page tables) . 'X"'J OVCT ' 4. it provides no concurrency. EXERCISES I UiHCuss rocmcrv What is the need of performing n. -cm'erv 7 Discuss the '-m'0U5 methods pt-rfomiing recovery. 2. l)('>(! 'liN' lo); -iniscd and clit-cltpoint schemes for data n-covery. 3 Explain lng~b. i~ed rt-cm; -n’ nit-thuds What . in- the rule of clit-cl-tpotiit. -6 in it. Discuss. 4. Lxplain aniuiu- vi the rm-tht-ds to ru-nu-r the lost data in . i d. it. ih. i~c system. 5. Wliitli iipr of Liiluu-~ can «In! in .1 sistt-m ‘ 6. (unip. m- the tit-it-ricd and imnit-di. itt--riitdilimtiun '1'l'| nns iii the lo_g-hm. ..-d rt-aw cry in term til Dt‘fht‘. !d Jlltl iiiipli-iii: -iit. itn~n 7. l: pl. mi it-tliiiiqiim ut H‘t| I. ‘f' vi d. it. ib. isc in dt'(. Ill 8. l)i. ~cus. -. ianuus data ! l‘1.Urr issues l'h. it . m- the ti-(iitiiiitit-~. .iv. ii| .tblt- for data recovery? on t‘. ui . i lug tilr be used in; n-giivcry? Describc. an t-. iiiiplt- that includes all [l‘l‘l'T' ti}‘'f. lfItins (UNIX), REDO, etc). lion can wii ll'(Ut'! mini media Liilu W ('0 which your d. it. iba~e was stored. ’ Describe Rh! ntcfhdlllsm of such f(~(g)(-f ll. Why is i*cu'er nu-do. -d in databa. I. ~li'-ha. -ti - . . I L}, ‘Q reuier_ diiitri. _ d n, cUn, n,; 12. What do you und{If§L"‘d b‘. ' ‘ recovery and r l b it ' - . ' Y “vi”! the help at an “ample . e ia it } in tho. contut of database systems. ’ EXPW“ When is a system Cunqden ~ a . -d to be reliable’ ' . - . - » any two rt'Cn'en- Schemes ’ Explain "*0 "W0? types of system failures. U5‘ _ 14. Describe the shadow 9. What I . I In}; flit‘ ‘ “hat din-s it u~tit. iiii’ H this with the help til "Hm with checkpointing 139") 4 S‘based recovery schemes 0"“ .
  60. 60. Chapter QUERY PROCESSING AND OPTIMIZATION 1 1 .1 INTRODUCTION processing rguires that the DBM_fiden_tify__andg execute a stratgv for retrievi_ng the regiizs of the guerv. The query determines what data is to be found, but does not define‘ti~. e rrzetltod by which the data manager searches the database. Therefore Query optimization Ls rte-cesxsary to determine the optimal altemative to process a query. There are two main techniques _ ' 'on. The first approach is to use a rule based or heuristic method for "‘~ the operations in a query execution strategy. The second approach estimates the cost ‘ t execution strategies and chooses the best solution. In general most commercial f both techniques. 11.2 3ASlCS OF QUERY PROCESSING rocedure of convertin a”quer_v written in l1l3_. l‘ Query Processing : Que Processin is a level Ex. SQL, QBE) into a correct and efficient execution plan e. }'_~re. &<ed in low- 1 , which is used for data manipulation. Quay Processor : cessor is responsible for generating execution plan. pwise process. Before retrieving or updating data Execution Plan : Query processing is a ste . . hrough a series of query compilation steps. These steps are known depends upon its query processor i. e.. how mu‘-‘h n create? The better execution plan leads to low time and cost. hase is transformation in which parser first checks the syntrc butes used in the query that are defined in the In query processing, the first p query is transformed int‘) of querv and also checks the relations and attri database. After checking the syntax and ierifying the relations. 219

×