Distributed database notes in this note all important topics covered.DDB notes especially belong to computer science and IT students. Every topic explains very precisly.
2. • A CENTRALIZED DATABASE (SOMETIMES ABBREVIATED CDB) IS A
DATABASE THAT IS LOCATED, STORED, AND MAINTAINED IN A
SINGLE LOCATION. ... USERS ACCESS A CENTRALIZED DATABASE
THROUGH A COMPUTER NETWORK WHICH IS ABLE TO GIVE THEM
ACCESS TO THE CENTRAL CPU, WHICH IN TURN MAINTAINS TO
THE DATABASE ITSELF.
3. • LIST OF THE ADVANTAGES OF A CENTRALIZED DATABASE
• IT ALLOWS FOR WORKING ON CROSS-FUNCTIONAL PROJECTS. ...
• IT IS EASIER TO SHARE IDEAS ACROSS ANALYSTS. ...
• ANALYSTS CAN BE ASSIGNED TO SPECIFIC PROBLEMS OR PROJECTS CENTRALLY.
...
• HIGHER LEVELS OF SECURITY CAN BE OBTAINED. ...
• HIGHER LEVELS OF DEPENDABILITY ARE PRESENT WITHIN THE SYSTEM.
4. • DISADVANTAGES
• CENTRALIZED DATABASES ARE HIGHLY DEPENDENT ON NETWORK
CONNECTIVITY. ...
• BOTTLENECKS CAN OCCUR AS A RESULT OF HIGH TRAFFIC.
• LIMITED ACCESS BY MORE THAN ONE PERSON TO THE SAME SET OF DATA AS
THERE IS ONLY ONE COPY OF IT AND IT IS MAINTAINED IN A SINGLE LOCATION.
5. • DDB:
A DISTRIBUTED DATABASE (DDB) IS A COLLECTION OF MULTIPLE, LOGICALLY
INTERRELATED DATABASES DISTRIBUTED OVER A COMPUTER NETWORK.
A DISTRIBUTED DATABASE MANAGEMENT SYSTEM (D–DBMS) IS THE SOFTWARE THAT
MANAGES THE DDB AND PROVIDES AN ACCESS MECHANISM THAT MAKES THIS
DISTRIBUTION TRANSPARENT TO THE USERS.
DISTRIBUTED DATABASE SYSTEM (DDBS) = DDB + D–DBMS
6. • ADVANTAGES OF DISTRIBUTED DATABASE SYSTEM
• RELIABLE
• IN DISTRIBUTED DATABASE MANAGEMENT SYSTEM, IF ANY CONNECTED SYSTEM FAILS TO DO WORK THEN THERE IS NO
EFFECT ON THE PERFORMANCE OF THE SYSTEM. IT CONTINUES FUNCTIONING AND IT IS MORE RELIABLE THAN OTHER
SIMPLE DATABASE MANAGEMENT SYSTEM.
• LOW COMMUNICATION COST
• DATA AND INFORMATION IS STORED LOCALLY IN DISTRIBUTED DATABASE MANAGEMENT SYSTEM. ITS COMMUNICATION
COST AND DATA MANIPULATION BECOME EASY AND LESS COSTLY.
• MODULAR DEVELOPMENT
• MODULATION IN DISTRIBUTED DATABASE MANAGEMENT SYSTEM IS SO EASY. MORE SYSTEMS CAN BE MANIPULATED AND
INSTALLED BY JUST INSTALLING AND CONNECTING WITH THE DISTRIBUTED DATABASE SYSTEM WITH NO INTERRUPTION
AND FAILURE.
7. • DATA RECOVERY
• DATA CAN BE EASILY RECOVERED IN DISTRIBUTED DATABASE MANAGEMENT
SYSTEMS.
8. • DISADVANTAGES OF DISTRIBUTED DATABASE SYSTEM
• DATA INTEGRITY
• DATA IS UPDATED ON MULTIPLE SITES CAN CAUSE PROBLEMS. DATA INTEGRITY IS
MORE COMPLEX AND VERY HARD TO HANDLE.
• DUPLICATION OF DATA
• SAME TYPE OF DATA IS STORED IN DIFFERENT SYSTEMS MAKE DUPLICATION OF
DATA. IT TAKES MUCH SPACE TO STORE THE SAME DATA IN DIFFERENT COMPUTER
SYSTEMS IN DISTRIBUTED DATABASE MANAGEMENT SYSTEMS.
9. • IMPROPER DATA DISTRIBUTION
• IMPROPER DATA DISTRIBUTION CAN LEAD TO SLOW RESPONSE IN PROCESSING
OF QUERY. SAME DATA IS STORED IN DIFFERENT COMPUTERS CAN CREATE
MORE PROBLEMS IN DISTRIBUTED DATABASE MANAGEMENT SYSTEMS.
• LESS PROCESSING SPEED
• MUCH COMMUNICATION IS NEEDED TO A SIMPLE QUERY TO PERFORM. IN THIS
REASON AMPLE TIME PERIOD IS REQUIRED TO SOLVE A SPECIFIC PROBLEM.
• SECURITY PROBLEM.
10. • DESIGN ISSUES OF DISTRIBUTED SYSTEM :
• 1) HETEROGENEITY : HETEROGENEITY IS APPLIED TO THE NETWORK, COMPUTER
HARDWARE, OPERATING SYSTEM AND IMPLEMENTATION OF DIFFERENT
DEVELOPERS. A KEY COMPONENT OF THE HETEROGENEOUS DISTRIBUTED
SYSTEM CLIENT-SERVER ENVIRONMENT IS MIDDLEWARE. MIDDLEWARE IS A SET
OF SERVICES THAT ENABLES APPLICATION AND END-USER TO INTERACTS WITH
EACH OTHER ACROSS A HETEROGENEOUS DISTRIBUTED SYSTEM.
11. • 2) OPENNESS: THE OPENNESS OF THE DISTRIBUTED SYSTEM IS DETERMINED
PRIMARILY BY THE DEGREE TO WHICH NEW RESOURCE-SHARING SERVICES CAN
BE MADE AVAILABLE TO THE USERS. OPEN SYSTEMS ARE CHARACTERIZED BY
THE FACT THAT THEIR KEY INTERFACES ARE PUBLISHED. IT IS BASED ON A
UNIFORM COMMUNICATION MECHANISM AND PUBLISHED INTERFACE FOR
ACCESS TO SHARED RESOURCES. IT CAN BE CONSTRUCTED FROM
HETEROGENEOUS HARDWARE AND SOFTWARE.
12. • 3) SCALABILITY: SCALABILITY OF THE SYSTEM SHOULD REMAIN EFFICIENT EVEN
WITH A SIGNIFICANT INCREASE IN THE NUMBER OF USERS AND RESOURCES
CONNECTED.
• 4) SECURITY : SECURITY OF INFORMATION SYSTEM HAS THREE COMPONENTS
CONFIDENTIALLY, INTEGRITY AND AVAILABILITY. ENCRYPTION PROTECTS
SHARED RESOURCES, KEEPS SENSITIVE INFORMATION SECRETS WHEN
TRANSMITTED.
13. • 5) TRANSPARENCY : TRANSPARENCY ENSURES THAT THE DISTRIBUTES SYSTEM
SHOULD BE PERCEIVED AS A SINGLE ENTITY BY THE USERS OR THE APPLICATION
PROGRAMMERS RATHER THAN THE COLLECTION OF AUTONOMOUS SYSTEMS,
WHICH IS COOPERATING. THE USER SHOULD BE UNAWARE OF WHERE THE
SERVICES ARE LOCATED AND THE TRANSFERRING FROM A LOCAL MACHINE TO A
REMOTE ONE SHOULD BE TRANSPARENT.
14. • 7) CONCURRENCY: THERE IS A POSSIBILITY THAT SEVERAL CLIENTS WILL
ATTEMPT TO ACCESS A SHARED RESOURCE AT THE SAME TIME. MULTIPLE USERS
MAKE REQUESTS ON THE SAME RESOURCES, I.E READ, WRITE, AND UPDATE.
EACH RESOURCE MUST BE SAFE IN A CONCURRENT ENVIRONMENT. ANY OBJECT
THAT REPRESENTS A SHARED RESOURCE A DISTRIBUTED SYSTEM MUST ENSURE
THAT IT OPERATES CORRECTLY IN A CONCURRENT ENVIRONMENT.
15. • TYPES OF TRANSPARENCY:
• 1)LOCATION TRANSPARENCY:
• LOCATION TRANSPARENCY ENSURES THAT THE USER CAN QUERY ON ANY
TABLE(S) OR FRAGMENT(S) OF A TABLE AS IF THEY WERE STORED LOCALLY IN
THE USER'S SITE. THE FACT THAT THE TABLE OR ITS FRAGMENTS ARE STORED
AT REMOTE SITE IN THE DISTRIBUTED DATABASE SYSTEM, SHOULD BE
COMPLETELY OBLIVIOUS TO THE END USER.
16. • 2)REPLICATION TRANSPARENCE:
• REPLICATION TRANSPARENCY ENSURES THAT REPLICATION OF DATABASES ARE
HIDDEN FROM THE USERS. IT ENABLES USERS TO QUERY UPON A TABLE AS IF
ONLY A SINGLE COPY OF THE TABLE EXISTS. ... ALSO, IN CASE OF FAILURE OF A
SITE, THE USER CAN STILL PROCEED WITH HIS QUERIES USING REPLICATED
COPIES WITHOUT ANY KNOWLEDGE OF FAILURE.
17. • NAMING TRANSPARENCY:
• A TRANSPARENCY IS SOME ASPECT OF THE DISTRIBUTED SYSTEM THAT
IS HIDDEN FROM THE USER (PROGRAMMER, SYSTEM DEVELOPER, USER OR
APPLICATION PROGRAM). A TRANSPARENCY IS PROVIDED BY INCLUDING SOME
SET OF MECHANISMS IN THE DISTRIBUTED SYSTEM AT A LAYER BELOW THE
INTERFACE WHERE THE TRANSPARENCY IS REQUIRED.
18. • A RELATIONAL DATABASE IS A TYPE OF DATABASE THAT STORES AND PROVIDES
ACCESS TO DATA POINTS THAT ARE RELATED TO ONE ANOTHER. ... THE COLUMNS
OF THE TABLE HOLD ATTRIBUTES OF THE DATA, AND EACH RECORD USUALLY HAS A
VALUE FOR EACH ATTRIBUTE, MAKING IT EASY TO ESTABLISH THE RELATIONSHIPS
AMONG DATA POINTS.
• PROPERTIES OF RELATIONAL DATABASES
• VALUES ARE ATOMIC.
• ALL OF THE VALUES IN A COLUMN HAVE THE SAME DATA TYPE.
• EACH ROW IS UNIQUE.
• THE SEQUENCE OF COLUMNS IS INSIGNIFICANT.
19. • THE SEQUENCE OF ROWS IS INSIGNIFICANT.
• EACH COLUMN HAS A UNIQUE NAME.
• INTEGRITY CONSTRAINTS MAINTAIN DATA CONSISTENCY ACROSS MULTIPLE
TABLES.
20. • CLIENT-SERVER ARCHITECTURE IS A COMPUTING MODEL IN WHICH THE SERVER
HOSTS, DELIVERS AND MANAGES MOST OF THE RESOURCES AND SERVICES TO
BE CONSUMED BY THE CLIENT. THIS TYPE OF ARCHITECTURE HAS ONE OR MORE
CLIENT COMPUTERS CONNECTED TO A CENTRAL SERVER OVER A NETWORK OR
INTERNET CONNECTION.
• PEER-TO-PEER ARCHITECTURE (P2P ARCHITECTURE) IS A COMMONLY USED
COMPUTER NETWORKING ARCHITECTURE IN WHICH EACH WORKSTATION, OR
NODE, HAS THE SAME CAPABILITIES AND RESPONSIBILITIES. IT IS OFTEN
COMPARED AND CONTRASTED TO THE CLASSIC CLIENT/SERVER ARCHITECTURE,
IN WHICH SOME COMPUTERS ARE DEDICATED TO SERVING OTHERS
21. • 3)MULTI DBMS ARCHITECTURE:
THIS IS AN INTEGRATED DATABASE SYSTEM FORMED BY A COLLECTION OF TWO
OR MORE AUTONOMOUS DATABASE SYSTEMS. MULTI-DBMS CAN BE EXPRESSED
THROUGH SIX LEVELS OF SCHEMAS − MULTI-DATABASE VIEW LEVEL − DEPICTS
MULTIPLE USER VIEWS COMPRISING OF SUBSETS OF THE INTEGRATED
DISTRIBUTED DATABASE
22. • TWO TYPES OF SERVER CLIENT ARCHETECTURE:
• 1)SINGLE SERVER MULTIPLE CLIENT: A MULTIPLE CLIENT SERVER IS
A TYPE OF SOFTWARE ARCHITECTURE FOR COMPUTER NETWORKS WHERE
CLIENTS, WHICH CAN BE BASIC WORKSTATIONS OR FULLY FUNCTIONAL
PERSONAL COMPUTERS, REQUEST INFORMATION FROM A SERVER COMPUTER. ...
ONE SERVER IS ABLE TO HANDLE DOZENS OF INFORMATION REQUESTS FROM
CLIENT COMPUTERS SIMULTANEOUSLY.
23. • 2)MULTIPLE SERVER MULTIPLE CLIENT:A MULTIPLE CLIENT SERVER
IS A TYPE OF SOFTWARE ARCHITECTURE FOR COMPUTER NETWORKS WHERE
CLIENTS REQUEST INFORMATION FROM A SERVER COMPUTER. THE MOST
COMMON TYPE OF MULTIPLE CLIENT SERVER SYSTEM FOR SMALL BUSINESSES
AND HOMES IS THE SINGLE SERVER WITH MULTIPLE CLIENTS.
24. • PEER- TO-PEER ARCHITECTURE FOR DDBMS
• IN THESE SYSTEMS, EACH PEER ACTS BOTH AS A CLIENT AND A SERVER FOR IMPARTING
DATABASE SERVICES. THE PEERS SHARE THEIR RESOURCE WITH OTHER PEERS AND CO-
ORDINATE THEIR ACTIVITIES.
• THIS ARCHITECTURE GENERALLY HAS FOUR LEVELS OF SCHEMAS −
• GLOBAL CONCEPTUAL SCHEMA − DEPICTS THE GLOBAL LOGICAL VIEW OF DATA.
• LOCAL CONCEPTUAL SCHEMA − DEPICTS LOGICAL DATA ORGANIZATION AT EACH SITE.
• LOCAL INTERNAL SCHEMA − DEPICTS PHYSICAL DATA ORGANIZATION AT EACH SITE.
• EXTERNAL SCHEMA − DEPICTS USER VIEW OF DATA.
25.
26. • MULTI - DBMS ARCHITECTURES
• THIS IS AN INTEGRATED DATABASE SYSTEM FORMED BY A COLLECTION OF TWO OR MORE AUTONOMOUS
DATABASE SYSTEMS.
• MULTI-DBMS CAN BE EXPRESSED THROUGH SIX LEVELS OF SCHEMAS −
• MULTI-DATABASE VIEW LEVEL − DEPICTS MULTIPLE USER VIEWS COMPRISING OF SUBSETS OF THE INTEGRATED
DISTRIBUTED DATABASE.
• MULTI-DATABASE CONCEPTUAL LEVEL − DEPICTS INTEGRATED MULTI-DATABASE THAT COMPRISES OF GLOBAL
LOGICAL MULTI-DATABASE STRUCTURE DEFINITIONS.
• MULTI-DATABASE INTERNAL LEVEL − DEPICTS THE DATA DISTRIBUTION ACROSS DIFFERENT SITES AND MULTI-
DATABASE TO LOCAL DATA MAPPING.
• LOCAL DATABASE VIEW LEVEL − DEPICTS PUBLIC VIEW OF LOCAL DATA.
27. • LOCAL DATABASE CONCEPTUAL LEVEL − DEPICTS LOCAL DATA ORGANIZATION
AT EACH SITE.
• LOCAL DATABASE INTERNAL LEVEL − DEPICTS PHYSICAL DATA ORGANIZATION
AT EACH SITE.
• THERE ARE TWO DESIGN ALTERNATIVES FOR MULTI-DBMS −
• MODEL WITH MULTI-DATABASE CONCEPTUAL LEVEL.
• MODEL WITHOUT MULTI-DATABASE CONCEPTUAL LEVEL.
28.
29.
30. • WHAT IS DATA DISTRIBUTION STRATEGY:
• DISTRIBUTION STRATEGY IS THAT BY ALLOCATING DIFFERENT. RESOURCES, E.G.
NUMBER OF DATABASE NODES, TO DIFFERENT. CLASSES OF USERS, WE CAN
ROUTE THE DATABASE REQUESTS TO. DIFFERENT RESOURCES.
31. • DATA REPLICATION
• DATA REPLICATION IS THE PROCESS OF STORING SEPARATE COPIES
OF THE DATABASE AT TWO OR MORE SITES. IT IS A POPULAR
FAULT TOLERANCE TECHNIQUE OF DISTRIBUTED DATABASES.
32. • ADVANTAGES OF DATA REPLICATION
• RELIABILITY − IN CASE OF FAILURE OF ANY SITE, THE DATABASE SYSTEM
CONTINUES TO WORK SINCE A COPY IS AVAILABLE AT ANOTHER SITE(S).
• REDUCTION IN NETWORK LOAD − SINCE LOCAL COPIES OF DATA ARE
AVAILABLE, QUERY PROCESSING CAN BE DONE WITH REDUCED NETWORK USAGE,
PARTICULARLY DURING PRIME HOURS. DATA UPDATING CAN BE DONE AT NON-
PRIME HOURS.
33. • QUICKER RESPONSE − AVAILABILITY OF LOCAL COPIES OF DATA ENSURES
QUICK QUERY PROCESSING AND CONSEQUENTLY QUICK RESPONSE TIME.
• SIMPLER TRANSACTIONS − TRANSACTIONS REQUIRE LESS NUMBER OF JOINS
OF TABLES LOCATED AT DIFFERENT SITES AND MINIMAL COORDINATION
ACROSS THE NETWORK. THUS, THEY BECOME SIMPLER IN NATURE.
34. • DISADVANTAGES OF DATA REPLICATION
• INCREASED STORAGE REQUIREMENTS − MAINTAINING MULTIPLE COPIES
OF DATA IS ASSOCIATED WITH INCREASED STORAGE COSTS. THE STORAGE
SPACE REQUIRED IS IN MULTIPLES OF THE STORAGE REQUIRED FOR A
CENTRALIZED SYSTEM.
• INCREASED COST AND COMPLEXITY OF DATA UPDATING − EACH
TIME A DATA ITEM IS UPDATED, THE UPDATE NEEDS TO BE REFLECTED IN ALL
THE COPIES OF THE DATA AT THE DIFFERENT SITES. THIS REQUIRES COMPLEX
SYNCHRONIZATION TECHNIQUES AND PROTOCOLS.
35. • UNDESIRABLE APPLICATION – DATABASE COUPLING :
• IF COMPLEX UPDATE MECHANISMS ARE NOT USED, REMOVING DATA
INCONSISTENCY REQUIRES COMPLEX CO-ORDINATION AT APPLICATION LEVEL.
THIS RESULTS IN UNDESIRABLE APPLICATION – DATABASE COUPLING.
36. • UPDATING DISTRIBUTED DATA
• SYNCHRONOUS REPLICATION CONTROL
• IN SYNCHRONOUS REPLICATION APPROACH, THE DATABASE IS SYNCHRONIZED
SO THAT ALL THE REPLICATIONS ALWAYS HAVE THE SAME VALUE. A
TRANSACTION REQUESTING A DATA ITEM WILL HAVE ACCESS TO THE SAME
VALUE IN ALL THE SITES.
•
37. • ASYNCHRONOUS REPLICATION CONTROL
• IN ASYNCHRONOUS REPLICATION APPROACH, THE REPLICAS DO
NOT ALWAYS MAINTAIN THE SAME VALUE. ONE OR MORE REPLICAS
MAY STORE AN OUTDATED VALUE, AND A TRANSACTION CAN SEE
THE DIFFERENT VALUES. THE PROCESS OF BRINGING ALL THE
REPLICAS TO THE CURRENT VALUE IS CALLED SYNCHRONIZATION.
38. • FRAGMENTATION. FRAGMENTATION IS THE TASK OF DIVIDING A TABLE INTO
A SET OF SMALLER TABLES. THE SUBSETS OF THE TABLE ARE CALLED
FRAGMENTS. FRAGMENTATION CAN BE OF THREE TYPES: HORIZONTAL,
VERTICAL, AND HYBRID (COMBINATION OF HORIZONTAL AND VERTICAL).
39. • WHAT ARE THE ADVANTAGES OF USING FRAGMENTATION:
• THE MAIN ADVANTAGE OF FRAGMENTATION IS TO IMPROVE THE PERFORMANCE
OF DISTRIBUTED DATABASE DESIGN BY INCREASING THE EFFICIENCY SINCE
DATA IS STORED ONLY WHERE IT IS NEEDED. FRAGMENTS CAN BE ALLOCATED
AT DIFFERENT NETWORK SITES IN A PROCESS CALLED DATA ALLOCATION.
40. • WHY WE USED FRAGMENTATION:
• FRAGMENTATION IS A DATABASE SERVER FEATURE THAT ALLOWS YOU TO
CONTROL WHERE DATA IS STORED AT THE TABLE LEVEL. FRAGMENTATION
ENABLES YOU TO DEFINE GROUPS OF ROWS OR INDEX KEYS WITHIN A TABLE
ACCORDING TO SOME ALGORITHM OR SCHEME . ... YOU CAN USE THIS TABLE
TO ACCESS INFORMATION ABOUT YOUR FRAGMENTED TABLES AND INDEXES.
41. • 1)HORIZONTAL FRAGMENTATION: HF INVOLVES TAKING ROWS
(RECORDS) FROM A TABLE AND PLACING DIFFERENT ROWS AT DIFFERENT NODES
(LOCATIONS). FOR EXAMPLE, THE CUSTOMER TABLE MAY BE FRAGMENTED SUCH
THAT THE CUSTOMERS FOR A GIVEN OFFICE ARE STORED AT THAT OFFICE.
42. • CORRECTNESS RULES OF FRAGMENTATION
• 1)COMPLETENESS: TO ENSURE THAT THERE IS NO LOSS OF DATA DUE TO
FRAGMENTATION. COMPLETENESS PROPERTY ENSURES THIS BY CHECKING
WHETHER ALL THE RECORDS WHICH WERE PART OF A TABLE (BEFORE
FRAGMENTATION) ARE FOUND IN AT LEAST ONE OF THE FRAGMENTS AFTER
FRAGMENTATION.
43. • 2)RECONSTRACTION: THIS RULE ENSURES THE ABILITY TO RE-CONSTRUCT THE
ORIGINAL TABLE FROM THE FRAGMENTS THAT ARE CREATED. THIS RULE IS TO
CHECK WHETHER THE FUNCTIONAL DEPENDENCIES ARE PRESERVED OR NOT.
• 2)DISJOINT: THIS RULE ENSURES THAT NO RECORD WILL BECOME A PART OF
TWO OR MORE DIFFERENT FRAGMENTS DURING THE FRAGMENTATION PROCESS.
IF A TABLE R IS PARTITIONED INTO FRAGMENTS R1, R2, …, RN,
THEN DISJOINTNESS INSISTS THE FOLLOWING;
• R1 ∩ R2 ∩ … ∩ RN = NULL SET
44. • TYPES OF HORIZENTAL FREQMENTATION:
• 1)PRIMARY HF.
• 2)DERIVED HF.
• EXAMPLE OF HORIZONTAL FRAGMENTATION OF DATA FOR DISTRIBUTED
DATABASE
45. • PRIMARY HORIZONTAL FRAGMENTATION (PHF)
• PRIMARY HORIZONTAL FRAGMENTATION IS A TABLE FRAGMENTATION
TECHNIQUE IN WHICH WE FRAGMENT A SINGLE TABLE AND THIS
FRAGMENTATION IS ROW-WISE AND USING A SET OF SIMPLE CONDITIONS
• NOTE: CONDITIONS ARE ALSO CALLED PREDICATES.
• SIMPLE PREDICATE
46. • GIVEN A TABLE/RELATION R WITH SET OF ATTRIBUTES [A1, A2, A3, A4, …, AN], A
SIMPLE PREDICATE PI CAN BE EXPRESSED AS FOLLOWS;
• PI : AJ Θ VALUE
• WHERE Θ CAN BE ANY OF THE SYMBOLS IN THE SET {≤, ≥, ≠, <, >, =}, A VALUE
CAN BE ANY VALUE STORED IN THE TABLE FOR THE ATTRIBUTED A I. FOR
EXAMPLE, CONSIDER THE FOLLOWING TABLE STUDENT GIVEN IN FIGURE 1;
47. RollNo Marks University
T01 33 Harvard
T03 77 Stanford
T04 23 California
T02 89 California
T05 90 Harvard
T06 90 Harvard
T07 15 Stanford
48. • FIGURE 1: STUDENT TABLE
• FOR THE ABOVE TABLE, WE COULD DEFINE ANY SIMPLE PREDICATES LIKE
UNIVERSITY = ‘CALIFORNIA’, UNIVERSITY= ‘HARVARD’, MARKS < 77 ETC USING
THE ABOVE EXPRESSION “AJ Θ VALUE”.
• SET OF SIMPLE PREDICATES
• SET OF SIMPLE PREDICATES IS SET OF ALL CONDITIONS COLLECTIVELY REQUIRED
TO FRAGMENT A RELATION INTO SUBSETS. FOR A TABLE R, SET OF SIMPLE
PREDICATE CAN BE DEFINED AS;
49. • PREDICATE P = { P1, P2, …, PN}
• EXAMPLE 1
• AS AN EXAMPLE, FOR THE ABOVE TABLE STUDENT, IF SIMPLE CONDITIONS ARE,
MARKS < 77, MARKS ≥ 77, THEN,
• SET OF SIMPLE PREDICATES P1 = {MARKS < 77, MARKS ≥ 77}
50. • MIN-TERM PREDICATE
• WHEN WE FRAGMENT ANY RELATION HORIZONTALLY, WE USE SINGLE
CONDITION OR SET OF SIMPLE PREDICATES TO FILTER THE DATA. GIVEN A
RELATION R AND SET OF SIMPLE PREDICATES, WE CAN FRAGMENT A RELATION
HORIZONTALLY AS FOLLOWS;
• FRAGMENT, RI = ΣFI(R), 1 ≤ I ≤ N
• WHERE FI IS THE SET OF SIMPLE PREDICATES, ALSO CALLED AS A MIN-TERM
PREDICATE WHICH CAN BE WRITTEN AS FOLLOWS;
51. • MIN-TERM PREDICATE, MI=P1 Λ P2 Λ P3 Λ … Λ PN
• HERE, P1 MEANS BOTH P1 OR ¬(P1), P2 MEANS BOTH P2 OR ¬(P2), P3 MEANS BOTH P3 OR ¬(P3),
AND SO ON. USING THE CONJUNCTIVE FORM OF VARIOUS SIMPLE PREDICATES IN DIFFERENT
COMBINATION, WE CAN DERIVE MANY SUCH MIN-TERM PREDICATES.
• FOR THE EXAMPLE 1 STATED PREVIOUSLY, WE CAN DERIVE SET OF MIN-TERM PREDICATES
USING THE RULES STATED ABOVE AS FOLLOWS;
• WE WILL GET 2N MIN-TERM PREDICATES, WHERE N IS THE NUMBER OF SIMPLE PREDICATES IN
THE GIVEN PREDICATE SET. FOR P1, WE HAVE 2 SIMPLE PREDICATES. HENCE, WE WILL GET 4
(22) POSSIBLE COMBINATIONS OF MIN-TERM PREDICATES AS FOLLOWS;
• M1 = {MARKS < 77 Λ MARKS ≥ 77}
52. • M2 = {MARKS < 77 Λ ¬(MARKS ≥ 77)}
• M3 = {¬(MARKS < 77) Λ MARKS ≥ 77}
• M4 = {¬(MARKS < 77) Λ ¬(MARKS ≥ 77)}
• OUR NEXT STEP IS TO CHOOSE THE MIN-TERM PREDICATES WHICH CAN SATISFY CERTAIN
CONDITIONS TO FRAGMENT A TABLE AND ELIMINATE THE OTHERS WHICH ARE NOT USEFUL. FOR
EXAMPLE, THE ABOVE SET OF MIN-TERM PREDICATES CAN BE APPLIED EACH AS A FORMULA FI
STATED IN THE ABOVE RULE FOR FRAGMENT RI AS FOLLOWS;
• STUDENT1 = ΣMARKS< 77 Λ MARKS ≥ 77(STUDENT)
• WHICH CAN BE WRITTEN IN EQUIVALENT SQL QUERY AS,
• STUDENT1
• SELECT * FROM STUDENT WHERE MARKS < 77 AND MARKS ≥ 77;
53. • STUDENT2 = ΣMARKS< 77 Λ ¬(MARKS ≥ 77)(STUDENT)
• WHICH CAN BE WRITTEN IN EQUIVALENT SQL QUERY AS,
• STUDENT2
• SELECT * FROM STUDENT WHERE MARKS < 77 AND NOT MARKS ≥ 77; WHERE
NOT MARKS ≥ 77 IS EQUIVALENT TO MARKS < 77.
54. • DERIVED HORIZENTAL FREGMENTATION:THE PROCESS OF CREATING
HORIZONTAL FRAGMENTS OF A TABLE IN QUESTION BASED ON THE ALREADY
CREATED HORIZONTAL FRAGMENTS OF ANOTHER RELATION (FOR EXAMPLE,
BASE TABLE) IS CALLED DERIVED HORIZONTAL FRAGMENTATION. ... FOR
EXAMPLE, CONSIDER A RELATION WHICH IS CONNECTED WITH ANOTHER
RELATION USING FOREIGN KEY CONCEPT
55. • CONSIDER AN EXAMPLE, WHERE AN ORGANIZATION MAINTAINS THE INFORMATION
ABOUT ITS CUSTOMERS.THEY STORE INFORMATION ABOUT THE CUSTOMER IN
CUSTOMER TABLE AND THE CUSTOMER ADDRESSES IN C_ADDRESS TABLE AS
FOLLOWS;
CUSTOMER(CID, CNAME, PROD_PURCHASED, SHOP_LOCATION)
• C_ADDRESS(CID, C_ADDRESS)
• THE TABLE CUSTOMER STORES INFORMATION ABOUT THE CUSTOMER, THE
PRODUCT PURCHASED FROM THEIR SHOP, AND THE SHOP LOCATION WHERE THE
PRODUCT IS PURCHASED. C_ADDRESS STORES INFORMATION ABOUT PERMANENT
AND PRESENT ADDRESSES OF THE CUSTOMER. HERE, CUSTOMER IS THE OWNER
RELATION AND C_ADDRESS IS THE MEMBER RELATION.
56. CID CNAME PROD_PURCHASED SHOP_LOCATION
C001 Ram Air Conditioner Mumbai
C002 Guru Television Chennai
C010 Murugan Television Coimbatore
C003 Yuvraj DVD Player Pune
C004 Gopinath Washing machine Coimbatore
57. CID C_ADDRESS
C001 Bandra, Mumbai
C001 XYZ, Pune
C002 T.Nagar, Chennai
C002 Kovil street, Madurai
C003 ABX, Pune
C004 Gandhipuram, Ooty
C004 North street, Erode
C010 Peelamedu, Coimbatore
58. • IF THE ORGANIZATION WOULD GO FOR FRAGMENTING THE RELATION
CUSTOMER ON THE SHOP_LOCATION ATTRIBUTE, IT NEEDS TO CREATE 4
FRAGMENTS USING HORIZONTAL FRAGMENTATION TECHNIQUE AS GIVEN IN
FIGURE 3 BELOW.
CID CNAME PROD_PURCHASED SHOP_LOCATION
C001 Ram Air Conditioner Mumbai
CUSTOMER1
59. CID CNAME PROD_PURCHASED SHOP_LOCATION
C002 Guru Television Chennai
CID CNAME PROD_PURCHASED SHOP_LOCATION
C010 Murugan Television Coimbatore
C004 Gopinath Washing machine Coimbatore
CUSTOMER3
61. • NOW, IT IS NECESSARY TO FRAGMENT THE SECOND RELATION C_ADDRESS BASED ON THE
FRAGMENT CREATED ON CUSTOMER RELATION. BECAUSE, IN ANY OTHER WAY, IF WE
FRAGMENT THE RELATION C_ADDRESS, THEN IT MAY END IN DIFFERENT LOCATION FOR
DIFFERENT DATA. FOR EXAMPLE, IF C_ADDRESS IS FRAGMENTED ON THE LAST DIGIT OF THE
CID ATTRIBUTE, IT WILL END UP WITH MORE NUMBER OF FRAGMENTS AND THE DATA MAY
NOT BE STORED IN THE SAME LOCATION WHERE CUSTOMER INFORMATION ARE STORED. THAT
IS, CUSTOMER ‘RAM’ INFORMATION IS STORED IN MUMBAI AND HIS ADDRESS INFORMATION
MIGHT BE STORED SOMEWHERE ELSE. TO AVOID SUCH CONFUSION, THE TABLE C_ADDRESS
WHICH IS ACTUALLY A MEMBER TABLE OF CUSTOMER, MUST BE FRAGMENTED INTO FOUR
FRAGMENTS AND BASED ON THE CUSTOMER TABLE FRAGMENTS GIVEN IN FIGURE 3. THIS TYPE
OF FRAGMENTATION BASED ON OWNER RELATION IS CALLED DERIVED HORIZONTAL
FRAGMENTATION. THIS WILL WORK FOR RELATIONS WHERE AN EQUI-JOIN IS REQUIRED FOR
JOINING TWO RELATIONS. BECAUSE, AN EQUI-JOIN CAN BE REPRESENTED AS SET OF SEMI-
JOINS.
62. • THE FRAGMENTATION OF C_ADDRESS IS DONE AS FOLLOW AS SET OF SEMI-
JOINS AS FOLLOWS.
• C_ADDRESS1 = C_ADDRESS ⋉ CUSTOMER1
• C_ADDRESS2 = C_ADDRESS ⋉ CUSTOMER2
• C_ADDRESS3 = C_ADDRESS ⋉ CUSTOMER3
• C_ADDRESS4 = C_ADDRESS ⋉ CUSTOMER4
63. • THIS WILL RESULT IN FOUR FRAGMENTS OF C_ADDRESS WHERE THE CUSTOMER
ADDRESS OF ALL CUSTOMERS OF FRAGMENT CUSTOMER1 WILL GO INTO
C_ADDRESS1, AND THE CUSTOMER ADDRESS OF ALL CUSTOMERS OF FRAGMENT
CUSTOMER2 WILL GO INTO C_ADDRESS2, AND SO ON. THE RESULTANT
FRAGMENT OF C_ADDRESS WILL BE THE FOLLOWING.
•
• FIGURE 4: DERIVED HORIZONTAL FRAGMENTS OF FIGURE 2 AS A MEMBER
RELATION OF THE OWNER RELATION’S FRAGMENTS FROM FIGURE 3
66. • CHECKING FOR CORRECTNESS
• COMPLETENESS: THE COMPLETENESS OF A DERIVED HORIZONTAL FRAGMENTATION IS MORE
DIFFICULT THAN PRIMARY HORIZONTAL FRAGMENTATION. BECAUSE, THE PREDICATES USED
ARE DETERMINING THE FRAGMENTATION OF TWO RELATIONS. FORMALLY, FOR
FRAGMENTATION OF TWO RELATIONS R AND S, SUCH AS {R1, R2, …, R3} AND {S1, S2, …, S3},
THERE SHOULD BE ONE COMMON ATTRIBUTE SUCH AS A. THEN, FOR EACH TUPLE T OF RI,
THERE SHOULD BE A TUPLE SI WHICH HAVE A COMMON VALUE FOR A. THIS IS KNOWN
AS REFERENTIAL INTEGRITY.
• THE DERIVED FRAGMENTATION OF C_ADDRESS IS COMPLETE. BECAUSE, THE VALUE OF THE
COMMON ATTRIBUTES CID FOR THE FRAGMENTS CUSTOMERI AND C_ADDRESSI ARE THE SAME.
FOR EXAMPLE, THE VALUE PRESENT IN CID OF CUSTOMER1 IS ALSO AND ONLY PRESENT IN
C_ADDRESS1, ETC.
67. • RECONSTRUCTION: RECONSTRUCTION OF A RELATION FROM ITS
FRAGMENTS IS PERFORMED BY THE UNION OPERATOR IN BOTH THE PRIMARY
AND THE DERIVED HORIZONTAL FRAGMENTATION
• RECONSTRUCTION: RECONSTRUCTION OF A RELATION FROM ITS
FRAGMENTS IS PERFORMED BY THE UNION OPERATOR IN BOTH THE PRIMARY
AND THE DERIVED HORIZONTAL FRAGMENTATION..
68. • 2)VERTICAL FREQMENTATION:VERTICAL FRAGMENTATION REFERS TO THE
PROCESS OF DECOMPOSING A TABLE VERTICALLY BY ATTRIBUTES ARE
COLUMNS. IN THIS FRAGMENTATION, SOME OF THE ATTRIBUTES ARE STORED IN
ONE SYSTEM AND THE REST ARE STORED IN OTHER SYSTEMS. THIS IS BECAUSE
EACH SITE MAY NOT NEED ALL COLUMNS OF A TABLE.
• =>EACH SITE MAY NOT ALL THE ATTRIBUTE OF RELATION.
• =>IT IS SUBSET OF A RELATION WHICH IS CREATED BY A SUBSET OF COLUMN.
69. • =>A VF OF A RELATION PRODUCE FREGMENTS R1,R2,R3….RN, EACH OF WHICH
CONTAIN SUBSET OF ATTRIBUTE OF R AND PRIMARY KEY OF R.
• =>RECONSTRACTION OF VF IS JOIN OPERATOR.
70. • =>IN ORDER TO TAKE CARE OF RESTORATION, EACH FRAGMENT MUST
CONTAIN THE PRIMARY KEY FIELD(S) IN A TABLE. THE FRAGMENTATION SHOULD
BE IN SUCH A MANNER THAT WE CAN REBUILD A TABLE FROM THE FRAGMENT
BY TAKING THE NATURAL JOIN OPERATION AND TO MAKE IT POSSIBLE WE NEED
TO INCLUDE A SPECIAL ATTRIBUTE CALLED TUPLE-ID TO THE SCHEMA. FOR
THIS PURPOSE, A USER CAN USE ANY SUPER KEY. AND BY THIS, THE TUPLES OR
ROWS CAN BE LINKED TOGETHER. THE PROJECTION IS AS FOLLOWS:
71. • FOR EXAMPLE, FOR THE EMPLOYEE TABLE WE HAVE T1 AS :
• ENO ENAME DESIGNTUPLE_ID
• 101 A ABC 1
• 102 B ABC 2
• 103 C ABC 3
• 104 D ABC 1
• 105 E ABC 4
72. • ΠA1, A2,…, AN (T)
• WHERE, Π IS RELATIONAL ALGEBRA OPERATOR
•
A1…., AN ARE THE AATRIUBUTES OF T
• T IS THE TABLE (RELATION)
73. • FOR THE SECOND. SUB TABLE OF RELATION AFTER VERTICAL FRAGMENTATION IS
GIVEN AS FOLLOWS :
• SALARY DEP TUPLE_ID
• 3000 1 1
• 4000 2 2
• 5500 3 3
• 5000 1 4
• 2000 4 5
74. • THIS IS T2 AND TO GET BACK TO THE ORIGINAL T, WE JOIN THESE TWO
FRAGMENTS T1 AND T2 AS ΠEMPLOYEE (T1 ⋈ T2)
• 3. MIXED FRAGMENTATION – THE COMBINATION OF VERTICAL FRAGMENTATION
OF A TABLE FOLLOWED BY FURTHER HORIZONTAL FRAGMENTATION OF SOME
FRAGMENTS IS CALLED MIXED OR HYBRID FRAGMENTATION. FOR DEFINING THIS
TYPE OF FRAGMENTATION WE USE THE SELECT AND THE PROJECT OPERATIONS
OF RELATIONAL ALGEBRA. IN SOME SITUATIONS, THE HORIZONTAL AND THE
VERTICAL FRAGMENTATION ISN’T ENOUGH TO DISTRIBUTE DATA FOR SOME
APPLICATIONS AND IN THAT CONDITIONS, WE NEED A FRAGMENTATION CALLED
A MIXED FRAGMENTATION.
75. • MIXED FRAGMENTATION CAN BE DONE IN TWO DIFFERENT WAYS:
• THE FIRST METHOD IS TO FIRST CREATE A SET OR GROUP OF HORIZONTAL
FRAGMENTS AND THEN CREATE VERTICAL FRAGMENTS FROM ONE OR MORE OF
THE HORIZONTAL FRAGMENTS.
• THE SECOND METHOD IS TO FIRST CREATE A SET OR GROUP OF VERTICAL
FRAGMENTS AND THEN CREATE HORIZONTAL FRAGMENTS FROM ONE OR MORE
OF THE VERTICAL FRAGMENTS.
THE ORIGINAL RELATION CAN BE OBTAINED BY THE COMBINATION OF JOIN AND
UNION OPERATIONS WHICH IS GIVEN AS FOLLOWS:
76. • ΣP(ΠA1, A2..,AN(T))
• ΠA1,A2….,AN (ΣP(T))
• FOR EXAMPLE, FOR OUR EMPLOYEE TABLE, BELOW IS THE IMPLEMENTATION OF
MIXED FRAGMENTATION IS ΠENAME, DESIGN (ΣENO > 102(EMPLOYEE))
• THE RESULT OF THIS FRAGMENTATION IS:
78. • HYBRID FRAGMENTATION:
• IN HYBRID FRAGMENTATION, A COMBINATION OF HORIZONTAL AND VERTICAL
FRAGMENTATION TECHNIQUES ARE USED. THIS IS THE MOST FLEXIBLE
FRAGMENTATION TECHNIQUE SINCE IT GENERATES FRAGMENTS WITH MINIMAL
EXTRANEOUS INFORMATION. HOWEVER, RECONSTRUCTION OF THE ORIGINAL
TABLE IS OFTEN AN EXPENSIVE TASK.
•
79. • HYBRID FRAGMENTATION CAN BE DONE IN TWO ALTERNATIVE WAYS −
• AT FIRST, GENERATE A SET OF HORIZONTAL FRAGMENTS; THEN GENERATE
VERTICAL FRAGMENTS FROM ONE OR MORE OF THE HORIZONTAL FRAGMENTS.
• AT FIRST, GENERATE A SET OF VERTICAL FRAGMENTS; THEN GENERATE
HORIZONTAL FRAGMENTS FROM ONE OR MORE OF THE VERTICAL FRAGMENTS
80. • WHAT IS A TRANSACTION IN DISTRIBUTED DBMS:
• A PROGRAM THAT INCLUDES A COLLECTION OF DATABASE OPERATIONS WHICH
ARE EXECUTED AS A LOGICAL UNIT OF PROCESSING THE DATA IS KNOWN AS A
TRANSACTION. IN A TRANSACTION ONE OR MORE OF THE DATA OPERATIONS
ARE PERFORMED SUCH AS INSERT, UPDATE, DELETE OR RETRIEVE.
81. • TYPES :
• 1)LOCAL TRANSACTION
• 2)GLOBAL TRANSACTION
• 1) SOME SOFTWARE PLATFORMS DO NOT PROVIDE TRANSACTION
COORDINATION AS PART OF THE KERNEL OPERATING SYSTEM. WHEN,
INSTEAD, EACH RESOURCE MANAGER INVOLVED IS SEPERATELY COORDINATING
ITS OWN CHANGES, AND ONLY ITS CHANGES, THE TRANSACTION IS KNOWN AS
A LOCAL TRANSACTION. ...
82. • LOCAL TRANSACTION TO THE SITE IT HAS TWO TERM
• A)COORDINATING SITE:
• IT IS THE SITE WHERE THE TRANSACTION IS INITIATED.
• B)PARTICIPATING SITE:
• THESE ARE THE SITES WHERE SUBTRANSACTION ARE EXECUTED.
83. • 2)GLOBAL TRANSACTION:
• A GLOBAL TRANSACTION IS A MECHANISM THAT ALLOWS A SET OF
PROGRAMMING TASKS, POTENTIALLY USING MORE THAN ONE RESOURCE
MANAGER AND POTENTIALLY EXECUTING ON MULTIPLE SERVERS, TO BE
TREATED AS ONE LOGICAL UNIT. ... A GLOBAL TRANSACTION MAY BE
COMPOSED OF SEVERAL LOCAL TRANSACTIONS, EACH ACCESSING THE SAME
RESOURCE MANAGER.
84. • TRANSACTION MANAGER(TM):
• EACH SITE HAS ITS OWN TM.IT MANAGES THE EXECUTION OF THOSE
TRANSACTION OR SUBTRANSACTION THAT ACCESS DATA STORE IN THAT SITE.
• TRANSACTION COORDINATOR:
• IT IS PRESENT EACH SITE AND IS RESPONSIBLE FOR COORDINATING THE
EXECUTION OF ALL TRANSACTION INITIATED AT THAT SITE
85. • DISTRIBUTED TRANSACTION ARCHITECTURE:
• DISTRIBUTED TRANSACTION: IS A SET OF OPERATIONS ON DATA THAT IS
PERFORMED ACROSS TWO OR MORE DATA REPOSITORIES (ESPECIALLY
DATABASES). IT IS TYPICALLY COORDINATED ACROSS SEPARATE NODES
CONNECTED BY A NETWORK, BUT MAY ALSO SPAN MULTIPLE DATABASES ON A
SINGLE SERVER.
86.
87. • ACID PROPERTIES:
• A TRANSACTION IS A SINGLE LOGICAL UNIT OF WORK WHICH ACCESSES AND
POSSIBLY MODIFIES THE CONTENTS OF A DATABASE. TRANSACTIONS ACCESS
DATA USING READ AND WRITE OPERATIONS.
IN ORDER TO MAINTAIN CONSISTENCY IN A DATABASE, BEFORE AND AFTER THE
TRANSACTION, CERTAIN PROPERTIES ARE FOLLOWED. THESE ARE
CALLED ACID PROPERTIES.
88.
89. • ATOMICITY
BY THIS, WE MEAN THAT EITHER THE ENTIRE TRANSACTION TAKES PLACE AT
ONCE OR DOESN’T HAPPEN AT ALL. THERE IS NO MIDWAY I.E. TRANSACTIONS
DO NOT OCCUR PARTIALLY. EACH TRANSACTION IS CONSIDERED AS ONE UNIT
AND EITHER RUNS TO COMPLETION OR IS NOT EXECUTED AT ALL. IT INVOLVES
THE FOLLOWING TWO OPERATIONS.
—ABORT: IF A TRANSACTION ABORTS, CHANGES MADE TO DATABASE ARE NOT
VISIBLE.
—COMMIT: IF A TRANSACTION COMMITS, CHANGES MADE ARE VISIBLE.
ATOMICITY IS ALSO KNOWN AS THE ‘ALL OR NOTHING RULE’.
90. • CONSIDER THE FOLLOWING TRANSACTION T CONSISTING OF T1 AND T2:
TRANSFER OF 100 FROM ACCOUNT X TO ACCOUNT Y.
•
91. • IF THE TRANSACTION FAILS AFTER COMPLETION OF T1 BUT BEFORE
COMPLETION OF T2.( SAY, AFTER WRITE(X) BUT BEFORE WRITE(Y)), THEN
AMOUNT HAS BEEN DEDUCTED FROM X BUT NOT ADDED TO Y. THIS RESULTS IN
AN INCONSISTENT DATABASE STATE. THEREFORE, THE TRANSACTION MUST BE
EXECUTED IN ENTIRETY IN ORDER TO ENSURE CORRECTNESS OF DATABASE
STATE.
92. • CONSISTENCY
THIS MEANS THAT INTEGRITY CONSTRAINTS MUST BE MAINTAINED SO THAT
THE DATABASE IS CONSISTENT BEFORE AND AFTER THE TRANSACTION. IT
REFERS TO THE CORRECTNESS OF A DATABASE. REFERRING TO THE EXAMPLE
ABOVE,
THE TOTAL AMOUNT BEFORE AND AFTER THE TRANSACTION MUST BE
MAINTAINED.
TOTAL BEFORE T OCCURS = 500 + 200 = 700.
TOTAL AFTER T OCCURS = 400 + 300 = 700.
THEREFORE, DATABASE IS CONSISTENT. INCONSISTENCY OCCURS IN
CASE T1 COMPLETES BUT T2 FAILS. AS A RESULT T IS INCOMPLETE.
93. • ISOLATION
THIS PROPERTY ENSURES THAT MULTIPLE TRANSACTIONS CAN OCCUR
CONCURRENTLY WITHOUT LEADING TO THE INCONSISTENCY OF DATABASE
STATE. TRANSACTIONS OCCUR INDEPENDENTLY WITHOUT INTERFERENCE.
CHANGES OCCURRING IN A PARTICULAR TRANSACTION WILL NOT BE VISIBLE TO
ANY OTHER TRANSACTION UNTIL THAT PARTICULAR CHANGE IN THAT
TRANSACTION IS WRITTEN TO MEMORY OR HAS BEEN COMMITTED. THIS
PROPERTY ENSURES THAT THE EXECUTION OF TRANSACTIONS CONCURRENTLY
WILL RESULT IN A STATE THAT IS EQUIVALENT TO A STATE ACHIEVED THESE
WERE EXECUTED SERIALLY IN SOME ORDER.
94. • LET X= 500, Y = 500.
CONSIDER TWO TRANSACTIONS T AND T”.
95. • SUPPOSE T HAS BEEN EXECUTED TILL READ (Y) AND THEN T’’ STARTS. AS A
RESULT , INTERLEAVING OF OPERATIONS TAKES PLACE DUE TO WHICH T’’ READS
CORRECT VALUE OF X BUT INCORRECT VALUE OF Y AND SUM COMPUTED BY
T’’: (X+Y = 50, 000+500=50, 500)
IS THUS NOT CONSISTENT WITH THE SUM AT END OF TRANSACTION:
T: (X+Y = 50, 000 + 450 = 50, 450).
THIS RESULTS IN DATABASE INCONSISTENCY, DUE TO A LOSS OF 50 UNITS.
HENCE, TRANSACTIONS MUST TAKE PLACE IN ISOLATION AND CHANGES
SHOULD BE VISIBLE ONLY AFTER THEY HAVE BEEN MADE TO THE MAIN
MEMORY.
96. • DURABILITY:
THIS PROPERTY ENSURES THAT ONCE THE TRANSACTION HAS COMPLETED
EXECUTION, THE UPDATES AND MODIFICATIONS TO THE DATABASE ARE
STORED IN AND WRITTEN TO DISK AND THEY PERSIST EVEN IF A SYSTEM
FAILURE OCCURS. THESE UPDATES NOW BECOME PERMANENT AND ARE STORED
IN NON-VOLATILE MEMORY. THE EFFECTS OF THE TRANSACTION, THUS, ARE
NEVER LOST.
97. • THE ACID PROPERTIES, IN TOTALITY, PROVIDE A MECHANISM TO ENSURE
CORRECTNESS AND CONSISTENCY OF A DATABASE IN A WAY SUCH THAT EACH
TRANSACTION IS A GROUP OF OPERATIONS THAT ACTS A SINGLE UNIT,
PRODUCES CONSISTENT RESULTS, ACTS IN ISOLATION FROM OTHER
OPERATIONS AND UPDATES THAT IT MAKES ARE DURABLY STORED.
98. • CONCURRENCY CONTROL:
• IT IS THE PROCESS OF MANAGING SIMULTANEOUS EXECUTION OF TRANSACTION IN
A SHARED DB.
• CONCURRENCY CONTROL IS PROVIDED IN A DATABASE TO:
• (I) ENFORCE ISOLATION AMONG TRANSACTIONS.
• (II) PRESERVE DATABASE CONSISTENCY THROUGH CONSISTENCY PRESERVING
EXECUTION OF TRANSACTIONS.
• (III) RESOLVE READ-WRITE AND WRITE-READ CONFLICTS.
99. • LOCK BASED PROTOCOLS –
A LOCK IS A VARIABLE ASSOCIATED WITH A DATA ITEM THAT DESCRIBES A
STATUS OF DATA ITEM WITH RESPECT TO POSSIBLE OPERATION THAT CAN BE
APPLIED TO IT. THEY SYNCHRONIZE THE ACCESS BY CONCURRENT
TRANSACTIONS TO THE DATABASE ITEMS. IT IS REQUIRED IN THIS PROTOCOL
THAT ALL THE DATA ITEMS MUST BE ACCESSED IN A MUTUALLY EXCLUSIVE
MANNER. OR A LOCK GUARANTEE EXCLUSIVE USE OF DATA ITEM TO A CURRENT
TRANSACTION.
• =>ACCESS THE DATA ITEM(LOCK ACQUIRE)
• =>AFTER COMPLETION OF TRANSACTION(RELEASE)
100. • 1) SHARED LOCK (S): ALSO KNOWN AS READ-ONLY LOCK. AS THE NAME
SUGGESTS IT CAN BE SHARED BETWEEN TRANSACTIONS BECAUSE WHILE
HOLDING THIS LOCK THE TRANSACTION DOES NOT HAVE THE PERMISSION TO
UPDATE DATA ON THE DATA ITEM. S-LOCK IS REQUESTED USING LOCK-S
INSTRUCTION.
• 2) EXCLUSIVE LOCK (X): DATA ITEM CAN BE BOTH READ AS WELL AS
WRITTEN.THIS IS EXCLUSIVE AND CANNOT BE HELD SIMULTANEOUSLY ON THE
SAME DATA ITEM. X-LOCK IS REQUESTED USING LOCK-X INSTRUCTION.
102. • CONVERSION OF LOCK:
=> HERE WE CAN SEE THAT ACQUIRING LOCKS ON DATA HAPPENS IN FIRST PHASE
AND RELEASING LOCKS ON DATA HAPPENS IN SECOND PHASE. CONVERTING
LOCKS FROM SHARED TO EXCLUSIVE (UPGRADING) CAN BE DONE IN FIRST PHASE
ALONE WHILE CONVERTING FROM EXCLUSIVE TO SHARED (DOWNGRADING) IS
DONE IN RELEASE PHASE
103. • TWO PHASE COMMIT PROTOCOLS:
• COMMIT REQUEST OR VOTING PHASE.
• THE COORDINATOR SENDS A QUERY TO COMMIT MESSAGE TO ALL
PARTICIPANTS AND WAITS UNTIL IT HAS RECEIVED A REPLY FROM ALL
PARTICIPANTS.
• THE PARTICIPANTS EXECUTE THE TRANSACTION UP TO THE POINT WHERE THEY
WILL BE ASKED TO COMMIT. THEY EACH WRITE AN ENTRY TO THEIR UNDO LOG
AND AN ENTRY TO THEIR .
104. • EACH PARTICIPANT REPLIES WITH AN AGREEMENT MESSAGE (PARTICIPANT
VOTES YES TO COMMIT), IF THE PARTICIPANT'S ACTIONS SUCCEEDED, OR AN
ABORT MESSAGE (PARTICIPANT VOTES NO, NOT TO COMMIT), IF THE
PARTICIPANT EXPERIENCES A FAILURE THAT WILL MAKE IT IMPOSSIBLE TO
COMMIT.
105. • COMMIT OR COMPLETION PHASE OR DECISION PHASE:
• THE COORDINATOR SENDS A COMMIT MESSAGE TO ALL THE PARTICIPANTS.
• EACH PARTICIPANT COMPLETES THE OPERATION, AND RELEASES ALL THE LOCKS
AND RESOURCES HELD DURING THE TRANSACTION.
106. • EACH PARTICIPANT SENDS AN ACKNOWLEDGEMENT TO THE COORDINATOR.
• THE COORDINATOR COMPLETES THE TRANSACTION WHEN ALL
ACKNOWLEDGMENTS HAVE BEEN RECEIVED.
TWO RULES:
1) [READY,T] MESSAGE FROM ALL SITE.THE TRANSACTION WILL BE COMMIT.
2) IF AT LEAST ONE[NOT READY,T] MESSAGE FROM A SITE THE TRANSACTION WILL
BE ABORT.
107. • RECOVERY IN DDB
• POSSIBLE ERROR:
• 1)LOSS OF MESSAGE.
• 2)FAILURE OF SITE AT WHICH TRANSACTION IS RUNNING.
• 3)COMMUNICATION LINK DOWN.
108. • FAILURE OF PARTICIPATING SITE:
• 1)SITE FAIL BEFORE SEND [T,READY].
• 2)SITE FAIL AFTER SEND [T,READY].
109. • HANDLING A FAILURE OF A PARTICIPATING SITE:
• 1. THE RESPONSE OF THE TRANSACTION COORDINATOR OF TRANSACTION T.
• IF THE FAILED SITE HAVE NOT SENT ANY <READY T> MESSAGE, THE TC
CANNOT DECIDE TO COMMIT THE TRANSACTION [REMEMBER, IN DISTRIBUTED
DATABASE ALL THE PARTICIPATING SITES MUST BE READY TO COMMIT. EVEN IF,
ONE SITE IS NOT READY, THEN THE WHOLE TRANSACTION NEEDS TO BE
ABORTED BY THE TC]. HENCE, THE TRANSACTION T SHOULD BE ABORTED AND
OTHER PARTICIPATING SITES TO BE INFORMED.
110. • 2. THE RESPONSE OF THE FAILED SITE WHEN IT RECOVERS.
• WHEN RECOVER FROM FAILURE, THE RECOVERING SITE SI MUST IDENTIFY THE
FATE OF THE TRANSACTIONS WHICH WERE GOING ON DURING THE FAILURE OF
SI. THIS CAN BE DONE BY EXAMINING THE LOG FILE ENTRIES OF SITE SI.
111. • FAILURE OF COORDINATING SITE:
• 1) AFTER THE COORDINATOR WRITES THE PREPARE LOG RECORD AND BEFORE
THE GATEWAY COMMIT PHASE.
• 2) AFTER THE COORDINATOR SENDS A COMMIT MESSAGE TO THE GATEWAY BUT
BEFORE IT RECEIVES A REPLY.
• 3) AFTER GATEWAY COMMIT PHASE BUT BEFORE THE COORDINATOR WRITES A
COMMIT RECORD TO THE LOGICAL LOG
112. • HANDLING THE FAILURE OF A COORDINATOR SITE
• LET US SUPPOSE THAT THE COORDINATOR SITE FAILED DURING EXECUTION OF 2
PHASE COMMIT (2PC) PROTOCOL FOR A TRANSACTION T. THIS SITUATION CAN BE
HANDLED IN TWO WAYS;
•
THE OTHER SITES WHICH ARE PARTICIPATING IN THE TRANSACTION T MAY TRY TO
DECIDE THE FATE OF THE TRANSACTION. THAT IS, THEY MAY TRY TO DECIDE ON
COMMIT OR ABORT OF T USING THE CONTROL MESSAGES AVAILABLE IN EVERY SITE.
•
THE SECOND WAY IS TO WAIT UNTIL THE COORDINATOR SITE RECOVERS.
113. • QUERY PROCESSING IN DISTRIBUTED DBMS
• A QUERY PROCESSING IN A DISTRIBUTED DATABASE MANAGEMENT
SYSTEM REQUIRES THE TRANSMISSION OF DATA BETWEEN THE COMPUTERS IN A
NETWORK. A DISTRIBUTION STRATEGY FOR A QUERY IS THE ORDERING OF DATA
TRANSMISSIONS AND LOCAL DATA PROCESSING IN A DATABASE SYSTEM.
GENERALLY, A QUERY IN DISTRIBUTED DBMS REQUIRES DATA FROM MULTIPLE SITES,
AND THIS NEED FOR DATA FROM DIFFERENT SITES IS CALLED THE TRANSMISSION OF
DATA THAT CAUSES COMMUNICATION COSTS. QUERY PROCESSING IN DBMS IS
DIFFERENT FROM QUERY PROCESSING IN CENTRALIZED DBMS DUE TO THIS
COMMUNICATION COST OF DATA TRANSFER OVER THE NETWORK. THE
TRANSMISSION COST IS LOW WHEN SITES ARE CONNECTED THROUGH HIGH-SPEED
NETWORKS AND IS QUITE SIGNIFICANT IN OTHER NETWORKS.
114. • 1. COSTS (TRANSFER OF DATA) OF DISTRIBUTED QUERY PROCESSING :
• IN DISTRIBUTED QUERY PROCESSING, THE DATA TRANSFER COST OF
DISTRIBUTED QUERY PROCESSING MEANS THE COST OF TRANSFERRING
INTERMEDIATE FILES TO OTHER SITES FOR PROCESSING AND THEREFORE THE
COST OF TRANSFERRING THE ULTIMATE RESULT FILES TO THE LOCATION WHERE
THAT RESULT’S REQUIRED. LET’S SAY THAT A USER SENDS A QUERY TO SITE S1,
WHICH REQUIRES DATA FROM ITS OWN AND ALSO FROM ANOTHER SITE S2.
NOW, THERE ARE THREE STRATEGIES TO PROCESS THIS QUERY WHICH ARE
GIVEN BELOW:
115. • 1) WE CAN TRANSFER THE DATA FROM S2 TO S1 AND THEN PROCESS THE
QUERY
• 2) WE CAN TRANSFER THE DATA FROM S1 TO S2 AND THEN PROCESS THE
QUERY
• 3) WE CAN TRANSFER THE DATA FROM S1 AND S2 TO S3 AND THEN PROCESS
THE QUERY. SO THE CHOICE DEPENDS ON VARIOUS FACTORS LIKE, THE SIZE OF
RELATIONS AND THE RESULTS, THE COMMUNICATION COST BETWEEN
DIFFERENT SITES, AND AT WHICH THE SITE RESULT WILL BE UTILIZED.
116. COMMONLY, THE DATA TRANSFER COST IS CALCULATED IN TERMS OF THE SIZE
OF THE MESSAGES. BY USING THE BELOW FORMULA, WE CAN CALCULATE THE
DATA TRANSFER COST:
DATA TRANSFER COST = C * SIZE
WHERE C REFERS TO THE COST PER BYTE OF DATA TRANSFERRING AND SIZE IS
THE NO. OF BYTES TRANSMITTED.
117. • EXAMPLE: CONSIDER THE FOLLOWING TABLE EMPLOYEE AND DEPARTMENT
• SITE1: EMPLOYEE
• EID NAME SALARY DID
• EID- 10 BYTES
SALARY- 20 BYTES
DID- 10 BYTES
NAME- 20 BYTES
TOTAL RECORDS- 1000
RECORD SIZE- 60 BYTES
118. • SITE2: DEPARTMENT
DID DNAME
DID – 10 BYTES
DNAME-20BYTES
TOTAL RECORDS-50
RECORD SIZE -30 BYTES
119. • EXAMPLE : FIND THE NAME OF EMPLOYEES AND THEIR DEPARTMENT NAMES. ALSO, FIND THE
AMOUNT OF DATA TRANSFER TO EXECUTE THIS QUERY WHEN THE QUERY IS SUBMITTED TO
SITE 3.
• ANSWER : CONSIDERING THE QUERY IS SUBMITTED AT SITE 3 AND NEITHER OF THE TWO
RELATIONS THAT IS AN EMPLOYEE AND THE DEPARTMENT NOT AVAILABLE AT SITE 3. SO, TO
EXECUTE THIS QUERY, WE HAVE THREE STRATEGIES:
• 1) TRANSFER BOTH THE TABLES THAT IS EMPLOYEE AND DEPARTMENT AT SITE 3 THEN JOIN
THE TABLES THERE. THE TOTAL COST IN THIS IS 1000 * 60 + 50 * 30 = 60,000 + 1500 =
61500 BYTES.
• 2) TRANSFER THE TABLE EMPLOYEE TO SITE 2, JOIN THE TABLE AT SITE 2 AND THEN TRANSFER
THE RESULT AT SITE 3. THE TOTAL COST IN THIS IS 60 * 1000 + 60 * 1000 = 120000 BYTES
SINCE WE HAVE TO TRANSFER 1000 TUPLES HAVING NAME AND DNAME FROM SITE 1,
120. • 3) TRANSFER THE TABLE DEPARTMENT TO SITE 1, JOIN THE TABLE AT SITE 2
JOIN THE TABLE AT SITE1 AND THEN TRANSFER THE RESULT AT SITE3. THE
TOTAL COST IS 30 * 50 + 60 * 1000 = 61500 BYTES SINCE WE HAVE TO
TRANSFER 1000 TUPLES HAVING NAME AND DNAME FROM SITE 1 TO SITE 3
THAT IS 60 BYTES EACH.
• NOW, IF THE OPTIMISATION CRITERIA ARE TO REDUCE THE AMOUNT OF DATA
TRANSFER, WE CAN CHOOSE EITHER 1 OR 3 STRATEGIES FROM THE ABOVE.
•
121. • 2. USING SEMI JOIN IN DISTRIBUTED QUERY PROCESSING :
• THE SEMI-JOIN OPERATION IS USED IN DISTRIBUTED QUERY PROCESSING TO REDUCE
THE NUMBER OF TUPLES IN A TABLE BEFORE TRANSMITTING IT TO ANOTHER SITE.
THIS REDUCTION IN THE NUMBER OF TUPLES REDUCES THE NUMBER AND THE TOTAL
SIZE OF THE TRANSMISSION THAT ULTIMATELY REDUCING THE TOTAL COST OF
DATA TRANSFER. LET’S SAY THAT WE HAVE TWO TABLES R1, R2 ON SITE S1, AND S2.
NOW, WE WILL FORWARD THE JOINING COLUMN OF ONE TABLE SAY R1 TO THE SITE
WHERE THE OTHER TABLE SAY R2 IS LOCATED. THIS COLUMN IS JOINED WITH R2 AT
THAT SITE. THE DECISION WHETHER TO REDUCE R1 OR R2 CAN ONLY BE MADE
AFTER COMPARING THE ADVANTAGES OF REDUCING R1 WITH THAT OF REDUCING
R2. THUS, SEMI-JOIN IS A WELL-ORGANIZED SOLUTION TO REDUCE THE TRANSFER
OF DATA IN DISTRIBUTED QUERY PROCESSING.
122. • EXAMPLE : FIND THE AMOUNT OF DATA TRANSFERRED TO EXECUTE THE SAME QUERY GIVEN
IN THE ABOVE EXAMPLE USING SEMI-JOIN OPERATION.
• ANSWER : THE FOLLOWING STRATEGY CAN BE USED TO EXECUTE THE QUERY.
• SELECT ALL (OR PROJECT) THE ATTRIBUTES OF THE EMPLOYEE TABLE AT SITE 1 AND THEN
TRANSFER THEM TO SITE 3. FOR THIS, WE WILL TRANSFER NAME, DID(EMPLOYEE) AND THE
SIZE IS 25 * 1000 = 25000 BYTES.
• TRANSFER THE TABLE DEPARTMENT TO SITE 3 AND JOIN THE PROJECTED ATTRIBUTES OF
EMPLOYEE WITH THIS TABLE. THE SIZE OF THE DEPARTMENT TABLE IS 25 * 50 = 1250
• APPLYING THE ABOVE SCHEME, THE AMOUNT OF DATA TRANSFERRED TO EXECUTE THE QUERY
WILL BE 25000 + 1250 = 26250 BYTES.
123. • DISTRIBUTED SECURITY MODEL
• => IT HELPS IN SECURITY PROCESS AND CHANNEL FROM AUTHORIZED ACCESS.
• 1) PROTECTING OBJECT:
• SPECIFIES WHO IS ALLOWED TO PERFORM ACTION ON THE OBJECT.
• 2) THREAD TO PROCESS:
• IN BETWEEN CLIENT AND SERVER HACKER ACCESS SERVER.
124. • 3)THREAD TO CHANNEL COMMUNICATION:
• => MALICIOUS USER CAN COPY ALTER INJECT MESSAGE ON CHANNEL.
• =>