SlideShare a Scribd company logo
1 of 124
DISTRIBUTED DATABASE
SEMESTER 5TH
MUHAMMAD
WAQAS
• A CENTRALIZED DATABASE (SOMETIMES ABBREVIATED CDB) IS A
DATABASE THAT IS LOCATED, STORED, AND MAINTAINED IN A
SINGLE LOCATION. ... USERS ACCESS A CENTRALIZED DATABASE
THROUGH A COMPUTER NETWORK WHICH IS ABLE TO GIVE THEM
ACCESS TO THE CENTRAL CPU, WHICH IN TURN MAINTAINS TO
THE DATABASE ITSELF.
• LIST OF THE ADVANTAGES OF A CENTRALIZED DATABASE
• IT ALLOWS FOR WORKING ON CROSS-FUNCTIONAL PROJECTS. ...
• IT IS EASIER TO SHARE IDEAS ACROSS ANALYSTS. ...
• ANALYSTS CAN BE ASSIGNED TO SPECIFIC PROBLEMS OR PROJECTS CENTRALLY.
...
• HIGHER LEVELS OF SECURITY CAN BE OBTAINED. ...
• HIGHER LEVELS OF DEPENDABILITY ARE PRESENT WITHIN THE SYSTEM.
• DISADVANTAGES
• CENTRALIZED DATABASES ARE HIGHLY DEPENDENT ON NETWORK
CONNECTIVITY. ...
• BOTTLENECKS CAN OCCUR AS A RESULT OF HIGH TRAFFIC.
• LIMITED ACCESS BY MORE THAN ONE PERSON TO THE SAME SET OF DATA AS
THERE IS ONLY ONE COPY OF IT AND IT IS MAINTAINED IN A SINGLE LOCATION.
• DDB:
A DISTRIBUTED DATABASE (DDB) IS A COLLECTION OF MULTIPLE, LOGICALLY
INTERRELATED DATABASES DISTRIBUTED OVER A COMPUTER NETWORK.
A DISTRIBUTED DATABASE MANAGEMENT SYSTEM (D–DBMS) IS THE SOFTWARE THAT
MANAGES THE DDB AND PROVIDES AN ACCESS MECHANISM THAT MAKES THIS
DISTRIBUTION TRANSPARENT TO THE USERS.
DISTRIBUTED DATABASE SYSTEM (DDBS) = DDB + D–DBMS
• ADVANTAGES OF DISTRIBUTED DATABASE SYSTEM
• RELIABLE
• IN DISTRIBUTED DATABASE MANAGEMENT SYSTEM, IF ANY CONNECTED SYSTEM FAILS TO DO WORK THEN THERE IS NO
EFFECT ON THE PERFORMANCE OF THE SYSTEM. IT CONTINUES FUNCTIONING AND IT IS MORE RELIABLE THAN OTHER
SIMPLE DATABASE MANAGEMENT SYSTEM.
• LOW COMMUNICATION COST
• DATA AND INFORMATION IS STORED LOCALLY IN DISTRIBUTED DATABASE MANAGEMENT SYSTEM. ITS COMMUNICATION
COST AND DATA MANIPULATION BECOME EASY AND LESS COSTLY.
• MODULAR DEVELOPMENT
• MODULATION IN DISTRIBUTED DATABASE MANAGEMENT SYSTEM IS SO EASY. MORE SYSTEMS CAN BE MANIPULATED AND
INSTALLED BY JUST INSTALLING AND CONNECTING WITH THE DISTRIBUTED DATABASE SYSTEM WITH NO INTERRUPTION
AND FAILURE.
• DATA RECOVERY
• DATA CAN BE EASILY RECOVERED IN DISTRIBUTED DATABASE MANAGEMENT
SYSTEMS.
• DISADVANTAGES OF DISTRIBUTED DATABASE SYSTEM
• DATA INTEGRITY
• DATA IS UPDATED ON MULTIPLE SITES CAN CAUSE PROBLEMS. DATA INTEGRITY IS
MORE COMPLEX AND VERY HARD TO HANDLE.
• DUPLICATION OF DATA
• SAME TYPE OF DATA IS STORED IN DIFFERENT SYSTEMS MAKE DUPLICATION OF
DATA. IT TAKES MUCH SPACE TO STORE THE SAME DATA IN DIFFERENT COMPUTER
SYSTEMS IN DISTRIBUTED DATABASE MANAGEMENT SYSTEMS.
• IMPROPER DATA DISTRIBUTION
• IMPROPER DATA DISTRIBUTION CAN LEAD TO SLOW RESPONSE IN PROCESSING
OF QUERY. SAME DATA IS STORED IN DIFFERENT COMPUTERS CAN CREATE
MORE PROBLEMS IN DISTRIBUTED DATABASE MANAGEMENT SYSTEMS.
• LESS PROCESSING SPEED
• MUCH COMMUNICATION IS NEEDED TO A SIMPLE QUERY TO PERFORM. IN THIS
REASON AMPLE TIME PERIOD IS REQUIRED TO SOLVE A SPECIFIC PROBLEM.
• SECURITY PROBLEM.
• DESIGN ISSUES OF DISTRIBUTED SYSTEM :
• 1) HETEROGENEITY : HETEROGENEITY IS APPLIED TO THE NETWORK, COMPUTER
HARDWARE, OPERATING SYSTEM AND IMPLEMENTATION OF DIFFERENT
DEVELOPERS. A KEY COMPONENT OF THE HETEROGENEOUS DISTRIBUTED
SYSTEM CLIENT-SERVER ENVIRONMENT IS MIDDLEWARE. MIDDLEWARE IS A SET
OF SERVICES THAT ENABLES APPLICATION AND END-USER TO INTERACTS WITH
EACH OTHER ACROSS A HETEROGENEOUS DISTRIBUTED SYSTEM.
• 2) OPENNESS: THE OPENNESS OF THE DISTRIBUTED SYSTEM IS DETERMINED
PRIMARILY BY THE DEGREE TO WHICH NEW RESOURCE-SHARING SERVICES CAN
BE MADE AVAILABLE TO THE USERS. OPEN SYSTEMS ARE CHARACTERIZED BY
THE FACT THAT THEIR KEY INTERFACES ARE PUBLISHED. IT IS BASED ON A
UNIFORM COMMUNICATION MECHANISM AND PUBLISHED INTERFACE FOR
ACCESS TO SHARED RESOURCES. IT CAN BE CONSTRUCTED FROM
HETEROGENEOUS HARDWARE AND SOFTWARE.
• 3) SCALABILITY: SCALABILITY OF THE SYSTEM SHOULD REMAIN EFFICIENT EVEN
WITH A SIGNIFICANT INCREASE IN THE NUMBER OF USERS AND RESOURCES
CONNECTED.
• 4) SECURITY : SECURITY OF INFORMATION SYSTEM HAS THREE COMPONENTS
CONFIDENTIALLY, INTEGRITY AND AVAILABILITY. ENCRYPTION PROTECTS
SHARED RESOURCES, KEEPS SENSITIVE INFORMATION SECRETS WHEN
TRANSMITTED.
• 5) TRANSPARENCY : TRANSPARENCY ENSURES THAT THE DISTRIBUTES SYSTEM
SHOULD BE PERCEIVED AS A SINGLE ENTITY BY THE USERS OR THE APPLICATION
PROGRAMMERS RATHER THAN THE COLLECTION OF AUTONOMOUS SYSTEMS,
WHICH IS COOPERATING. THE USER SHOULD BE UNAWARE OF WHERE THE
SERVICES ARE LOCATED AND THE TRANSFERRING FROM A LOCAL MACHINE TO A
REMOTE ONE SHOULD BE TRANSPARENT.
• 7) CONCURRENCY: THERE IS A POSSIBILITY THAT SEVERAL CLIENTS WILL
ATTEMPT TO ACCESS A SHARED RESOURCE AT THE SAME TIME. MULTIPLE USERS
MAKE REQUESTS ON THE SAME RESOURCES, I.E READ, WRITE, AND UPDATE.
EACH RESOURCE MUST BE SAFE IN A CONCURRENT ENVIRONMENT. ANY OBJECT
THAT REPRESENTS A SHARED RESOURCE A DISTRIBUTED SYSTEM MUST ENSURE
THAT IT OPERATES CORRECTLY IN A CONCURRENT ENVIRONMENT.
• TYPES OF TRANSPARENCY:
• 1)LOCATION TRANSPARENCY:
• LOCATION TRANSPARENCY ENSURES THAT THE USER CAN QUERY ON ANY
TABLE(S) OR FRAGMENT(S) OF A TABLE AS IF THEY WERE STORED LOCALLY IN
THE USER'S SITE. THE FACT THAT THE TABLE OR ITS FRAGMENTS ARE STORED
AT REMOTE SITE IN THE DISTRIBUTED DATABASE SYSTEM, SHOULD BE
COMPLETELY OBLIVIOUS TO THE END USER.
• 2)REPLICATION TRANSPARENCE:
• REPLICATION TRANSPARENCY ENSURES THAT REPLICATION OF DATABASES ARE
HIDDEN FROM THE USERS. IT ENABLES USERS TO QUERY UPON A TABLE AS IF
ONLY A SINGLE COPY OF THE TABLE EXISTS. ... ALSO, IN CASE OF FAILURE OF A
SITE, THE USER CAN STILL PROCEED WITH HIS QUERIES USING REPLICATED
COPIES WITHOUT ANY KNOWLEDGE OF FAILURE.
• NAMING TRANSPARENCY:
• A TRANSPARENCY IS SOME ASPECT OF THE DISTRIBUTED SYSTEM THAT
IS HIDDEN FROM THE USER (PROGRAMMER, SYSTEM DEVELOPER, USER OR
APPLICATION PROGRAM). A TRANSPARENCY IS PROVIDED BY INCLUDING SOME
SET OF MECHANISMS IN THE DISTRIBUTED SYSTEM AT A LAYER BELOW THE
INTERFACE WHERE THE TRANSPARENCY IS REQUIRED.
• A RELATIONAL DATABASE IS A TYPE OF DATABASE THAT STORES AND PROVIDES
ACCESS TO DATA POINTS THAT ARE RELATED TO ONE ANOTHER. ... THE COLUMNS
OF THE TABLE HOLD ATTRIBUTES OF THE DATA, AND EACH RECORD USUALLY HAS A
VALUE FOR EACH ATTRIBUTE, MAKING IT EASY TO ESTABLISH THE RELATIONSHIPS
AMONG DATA POINTS.
• PROPERTIES OF RELATIONAL DATABASES
• VALUES ARE ATOMIC.
• ALL OF THE VALUES IN A COLUMN HAVE THE SAME DATA TYPE.
• EACH ROW IS UNIQUE.
• THE SEQUENCE OF COLUMNS IS INSIGNIFICANT.
• THE SEQUENCE OF ROWS IS INSIGNIFICANT.
• EACH COLUMN HAS A UNIQUE NAME.
• INTEGRITY CONSTRAINTS MAINTAIN DATA CONSISTENCY ACROSS MULTIPLE
TABLES.
• CLIENT-SERVER ARCHITECTURE IS A COMPUTING MODEL IN WHICH THE SERVER
HOSTS, DELIVERS AND MANAGES MOST OF THE RESOURCES AND SERVICES TO
BE CONSUMED BY THE CLIENT. THIS TYPE OF ARCHITECTURE HAS ONE OR MORE
CLIENT COMPUTERS CONNECTED TO A CENTRAL SERVER OVER A NETWORK OR
INTERNET CONNECTION.
• PEER-TO-PEER ARCHITECTURE (P2P ARCHITECTURE) IS A COMMONLY USED
COMPUTER NETWORKING ARCHITECTURE IN WHICH EACH WORKSTATION, OR
NODE, HAS THE SAME CAPABILITIES AND RESPONSIBILITIES. IT IS OFTEN
COMPARED AND CONTRASTED TO THE CLASSIC CLIENT/SERVER ARCHITECTURE,
IN WHICH SOME COMPUTERS ARE DEDICATED TO SERVING OTHERS
• 3)MULTI DBMS ARCHITECTURE:
THIS IS AN INTEGRATED DATABASE SYSTEM FORMED BY A COLLECTION OF TWO
OR MORE AUTONOMOUS DATABASE SYSTEMS. MULTI-DBMS CAN BE EXPRESSED
THROUGH SIX LEVELS OF SCHEMAS − MULTI-DATABASE VIEW LEVEL − DEPICTS
MULTIPLE USER VIEWS COMPRISING OF SUBSETS OF THE INTEGRATED
DISTRIBUTED DATABASE
• TWO TYPES OF SERVER CLIENT ARCHETECTURE:
• 1)SINGLE SERVER MULTIPLE CLIENT: A MULTIPLE CLIENT SERVER IS
A TYPE OF SOFTWARE ARCHITECTURE FOR COMPUTER NETWORKS WHERE
CLIENTS, WHICH CAN BE BASIC WORKSTATIONS OR FULLY FUNCTIONAL
PERSONAL COMPUTERS, REQUEST INFORMATION FROM A SERVER COMPUTER. ...
ONE SERVER IS ABLE TO HANDLE DOZENS OF INFORMATION REQUESTS FROM
CLIENT COMPUTERS SIMULTANEOUSLY.
• 2)MULTIPLE SERVER MULTIPLE CLIENT:A MULTIPLE CLIENT SERVER
IS A TYPE OF SOFTWARE ARCHITECTURE FOR COMPUTER NETWORKS WHERE
CLIENTS REQUEST INFORMATION FROM A SERVER COMPUTER. THE MOST
COMMON TYPE OF MULTIPLE CLIENT SERVER SYSTEM FOR SMALL BUSINESSES
AND HOMES IS THE SINGLE SERVER WITH MULTIPLE CLIENTS.
• PEER- TO-PEER ARCHITECTURE FOR DDBMS
• IN THESE SYSTEMS, EACH PEER ACTS BOTH AS A CLIENT AND A SERVER FOR IMPARTING
DATABASE SERVICES. THE PEERS SHARE THEIR RESOURCE WITH OTHER PEERS AND CO-
ORDINATE THEIR ACTIVITIES.
• THIS ARCHITECTURE GENERALLY HAS FOUR LEVELS OF SCHEMAS −
• GLOBAL CONCEPTUAL SCHEMA − DEPICTS THE GLOBAL LOGICAL VIEW OF DATA.
• LOCAL CONCEPTUAL SCHEMA − DEPICTS LOGICAL DATA ORGANIZATION AT EACH SITE.
• LOCAL INTERNAL SCHEMA − DEPICTS PHYSICAL DATA ORGANIZATION AT EACH SITE.
• EXTERNAL SCHEMA − DEPICTS USER VIEW OF DATA.
• MULTI - DBMS ARCHITECTURES
• THIS IS AN INTEGRATED DATABASE SYSTEM FORMED BY A COLLECTION OF TWO OR MORE AUTONOMOUS
DATABASE SYSTEMS.
• MULTI-DBMS CAN BE EXPRESSED THROUGH SIX LEVELS OF SCHEMAS −
• MULTI-DATABASE VIEW LEVEL − DEPICTS MULTIPLE USER VIEWS COMPRISING OF SUBSETS OF THE INTEGRATED
DISTRIBUTED DATABASE.
• MULTI-DATABASE CONCEPTUAL LEVEL − DEPICTS INTEGRATED MULTI-DATABASE THAT COMPRISES OF GLOBAL
LOGICAL MULTI-DATABASE STRUCTURE DEFINITIONS.
• MULTI-DATABASE INTERNAL LEVEL − DEPICTS THE DATA DISTRIBUTION ACROSS DIFFERENT SITES AND MULTI-
DATABASE TO LOCAL DATA MAPPING.
• LOCAL DATABASE VIEW LEVEL − DEPICTS PUBLIC VIEW OF LOCAL DATA.
• LOCAL DATABASE CONCEPTUAL LEVEL − DEPICTS LOCAL DATA ORGANIZATION
AT EACH SITE.
• LOCAL DATABASE INTERNAL LEVEL − DEPICTS PHYSICAL DATA ORGANIZATION
AT EACH SITE.
• THERE ARE TWO DESIGN ALTERNATIVES FOR MULTI-DBMS −
• MODEL WITH MULTI-DATABASE CONCEPTUAL LEVEL.
• MODEL WITHOUT MULTI-DATABASE CONCEPTUAL LEVEL.
• WHAT IS DATA DISTRIBUTION STRATEGY:
• DISTRIBUTION STRATEGY IS THAT BY ALLOCATING DIFFERENT. RESOURCES, E.G.
NUMBER OF DATABASE NODES, TO DIFFERENT. CLASSES OF USERS, WE CAN
ROUTE THE DATABASE REQUESTS TO. DIFFERENT RESOURCES.
• DATA REPLICATION
• DATA REPLICATION IS THE PROCESS OF STORING SEPARATE COPIES
OF THE DATABASE AT TWO OR MORE SITES. IT IS A POPULAR
FAULT TOLERANCE TECHNIQUE OF DISTRIBUTED DATABASES.
• ADVANTAGES OF DATA REPLICATION
• RELIABILITY − IN CASE OF FAILURE OF ANY SITE, THE DATABASE SYSTEM
CONTINUES TO WORK SINCE A COPY IS AVAILABLE AT ANOTHER SITE(S).
• REDUCTION IN NETWORK LOAD − SINCE LOCAL COPIES OF DATA ARE
AVAILABLE, QUERY PROCESSING CAN BE DONE WITH REDUCED NETWORK USAGE,
PARTICULARLY DURING PRIME HOURS. DATA UPDATING CAN BE DONE AT NON-
PRIME HOURS.
• QUICKER RESPONSE − AVAILABILITY OF LOCAL COPIES OF DATA ENSURES
QUICK QUERY PROCESSING AND CONSEQUENTLY QUICK RESPONSE TIME.
• SIMPLER TRANSACTIONS − TRANSACTIONS REQUIRE LESS NUMBER OF JOINS
OF TABLES LOCATED AT DIFFERENT SITES AND MINIMAL COORDINATION
ACROSS THE NETWORK. THUS, THEY BECOME SIMPLER IN NATURE.
• DISADVANTAGES OF DATA REPLICATION
• INCREASED STORAGE REQUIREMENTS − MAINTAINING MULTIPLE COPIES
OF DATA IS ASSOCIATED WITH INCREASED STORAGE COSTS. THE STORAGE
SPACE REQUIRED IS IN MULTIPLES OF THE STORAGE REQUIRED FOR A
CENTRALIZED SYSTEM.
• INCREASED COST AND COMPLEXITY OF DATA UPDATING − EACH
TIME A DATA ITEM IS UPDATED, THE UPDATE NEEDS TO BE REFLECTED IN ALL
THE COPIES OF THE DATA AT THE DIFFERENT SITES. THIS REQUIRES COMPLEX
SYNCHRONIZATION TECHNIQUES AND PROTOCOLS.
• UNDESIRABLE APPLICATION – DATABASE COUPLING :
• IF COMPLEX UPDATE MECHANISMS ARE NOT USED, REMOVING DATA
INCONSISTENCY REQUIRES COMPLEX CO-ORDINATION AT APPLICATION LEVEL.
THIS RESULTS IN UNDESIRABLE APPLICATION – DATABASE COUPLING.
• UPDATING DISTRIBUTED DATA
• SYNCHRONOUS REPLICATION CONTROL
• IN SYNCHRONOUS REPLICATION APPROACH, THE DATABASE IS SYNCHRONIZED
SO THAT ALL THE REPLICATIONS ALWAYS HAVE THE SAME VALUE. A
TRANSACTION REQUESTING A DATA ITEM WILL HAVE ACCESS TO THE SAME
VALUE IN ALL THE SITES.
•
• ASYNCHRONOUS REPLICATION CONTROL
• IN ASYNCHRONOUS REPLICATION APPROACH, THE REPLICAS DO
NOT ALWAYS MAINTAIN THE SAME VALUE. ONE OR MORE REPLICAS
MAY STORE AN OUTDATED VALUE, AND A TRANSACTION CAN SEE
THE DIFFERENT VALUES. THE PROCESS OF BRINGING ALL THE
REPLICAS TO THE CURRENT VALUE IS CALLED SYNCHRONIZATION.
• FRAGMENTATION. FRAGMENTATION IS THE TASK OF DIVIDING A TABLE INTO
A SET OF SMALLER TABLES. THE SUBSETS OF THE TABLE ARE CALLED
FRAGMENTS. FRAGMENTATION CAN BE OF THREE TYPES: HORIZONTAL,
VERTICAL, AND HYBRID (COMBINATION OF HORIZONTAL AND VERTICAL).
• WHAT ARE THE ADVANTAGES OF USING FRAGMENTATION:
• THE MAIN ADVANTAGE OF FRAGMENTATION IS TO IMPROVE THE PERFORMANCE
OF DISTRIBUTED DATABASE DESIGN BY INCREASING THE EFFICIENCY SINCE
DATA IS STORED ONLY WHERE IT IS NEEDED. FRAGMENTS CAN BE ALLOCATED
AT DIFFERENT NETWORK SITES IN A PROCESS CALLED DATA ALLOCATION.
• WHY WE USED FRAGMENTATION:
• FRAGMENTATION IS A DATABASE SERVER FEATURE THAT ALLOWS YOU TO
CONTROL WHERE DATA IS STORED AT THE TABLE LEVEL. FRAGMENTATION
ENABLES YOU TO DEFINE GROUPS OF ROWS OR INDEX KEYS WITHIN A TABLE
ACCORDING TO SOME ALGORITHM OR SCHEME . ... YOU CAN USE THIS TABLE
TO ACCESS INFORMATION ABOUT YOUR FRAGMENTED TABLES AND INDEXES.
• 1)HORIZONTAL FRAGMENTATION: HF INVOLVES TAKING ROWS
(RECORDS) FROM A TABLE AND PLACING DIFFERENT ROWS AT DIFFERENT NODES
(LOCATIONS). FOR EXAMPLE, THE CUSTOMER TABLE MAY BE FRAGMENTED SUCH
THAT THE CUSTOMERS FOR A GIVEN OFFICE ARE STORED AT THAT OFFICE.
• CORRECTNESS RULES OF FRAGMENTATION
• 1)COMPLETENESS: TO ENSURE THAT THERE IS NO LOSS OF DATA DUE TO
FRAGMENTATION. COMPLETENESS PROPERTY ENSURES THIS BY CHECKING
WHETHER ALL THE RECORDS WHICH WERE PART OF A TABLE (BEFORE
FRAGMENTATION) ARE FOUND IN AT LEAST ONE OF THE FRAGMENTS AFTER
FRAGMENTATION.
• 2)RECONSTRACTION: THIS RULE ENSURES THE ABILITY TO RE-CONSTRUCT THE
ORIGINAL TABLE FROM THE FRAGMENTS THAT ARE CREATED. THIS RULE IS TO
CHECK WHETHER THE FUNCTIONAL DEPENDENCIES ARE PRESERVED OR NOT.
• 2)DISJOINT: THIS RULE ENSURES THAT NO RECORD WILL BECOME A PART OF
TWO OR MORE DIFFERENT FRAGMENTS DURING THE FRAGMENTATION PROCESS.
IF A TABLE R IS PARTITIONED INTO FRAGMENTS R1, R2, …, RN,
THEN DISJOINTNESS INSISTS THE FOLLOWING;
• R1 ∩ R2 ∩ … ∩ RN = NULL SET
• TYPES OF HORIZENTAL FREQMENTATION:
• 1)PRIMARY HF.
• 2)DERIVED HF.
• EXAMPLE OF HORIZONTAL FRAGMENTATION OF DATA FOR DISTRIBUTED
DATABASE
• PRIMARY HORIZONTAL FRAGMENTATION (PHF)
• PRIMARY HORIZONTAL FRAGMENTATION IS A TABLE FRAGMENTATION
TECHNIQUE IN WHICH WE FRAGMENT A SINGLE TABLE AND THIS
FRAGMENTATION IS ROW-WISE AND USING A SET OF SIMPLE CONDITIONS
• NOTE: CONDITIONS ARE ALSO CALLED PREDICATES.
• SIMPLE PREDICATE
• GIVEN A TABLE/RELATION R WITH SET OF ATTRIBUTES [A1, A2, A3, A4, …, AN], A
SIMPLE PREDICATE PI CAN BE EXPRESSED AS FOLLOWS;
• PI : AJ Θ VALUE
• WHERE Θ CAN BE ANY OF THE SYMBOLS IN THE SET {≤, ≥, ≠, <, >, =}, A VALUE
CAN BE ANY VALUE STORED IN THE TABLE FOR THE ATTRIBUTED A I. FOR
EXAMPLE, CONSIDER THE FOLLOWING TABLE STUDENT GIVEN IN FIGURE 1;
RollNo Marks University
T01 33 Harvard
T03 77 Stanford
T04 23 California
T02 89 California
T05 90 Harvard
T06 90 Harvard
T07 15 Stanford
• FIGURE 1: STUDENT TABLE
• FOR THE ABOVE TABLE, WE COULD DEFINE ANY SIMPLE PREDICATES LIKE
UNIVERSITY = ‘CALIFORNIA’, UNIVERSITY= ‘HARVARD’, MARKS < 77 ETC USING
THE ABOVE EXPRESSION “AJ Θ VALUE”.
• SET OF SIMPLE PREDICATES
• SET OF SIMPLE PREDICATES IS SET OF ALL CONDITIONS COLLECTIVELY REQUIRED
TO FRAGMENT A RELATION INTO SUBSETS. FOR A TABLE R, SET OF SIMPLE
PREDICATE CAN BE DEFINED AS;
• PREDICATE P = { P1, P2, …, PN}
• EXAMPLE 1
• AS AN EXAMPLE, FOR THE ABOVE TABLE STUDENT, IF SIMPLE CONDITIONS ARE,
MARKS < 77, MARKS ≥ 77, THEN,
• SET OF SIMPLE PREDICATES P1 = {MARKS < 77, MARKS ≥ 77}
• MIN-TERM PREDICATE
• WHEN WE FRAGMENT ANY RELATION HORIZONTALLY, WE USE SINGLE
CONDITION OR SET OF SIMPLE PREDICATES TO FILTER THE DATA. GIVEN A
RELATION R AND SET OF SIMPLE PREDICATES, WE CAN FRAGMENT A RELATION
HORIZONTALLY AS FOLLOWS;
• FRAGMENT, RI = ΣFI(R), 1 ≤ I ≤ N
• WHERE FI IS THE SET OF SIMPLE PREDICATES, ALSO CALLED AS A MIN-TERM
PREDICATE WHICH CAN BE WRITTEN AS FOLLOWS;
• MIN-TERM PREDICATE, MI=P1 Λ P2 Λ P3 Λ … Λ PN
• HERE, P1 MEANS BOTH P1 OR ¬(P1), P2 MEANS BOTH P2 OR ¬(P2), P3 MEANS BOTH P3 OR ¬(P3),
AND SO ON. USING THE CONJUNCTIVE FORM OF VARIOUS SIMPLE PREDICATES IN DIFFERENT
COMBINATION, WE CAN DERIVE MANY SUCH MIN-TERM PREDICATES.
• FOR THE EXAMPLE 1 STATED PREVIOUSLY, WE CAN DERIVE SET OF MIN-TERM PREDICATES
USING THE RULES STATED ABOVE AS FOLLOWS;
• WE WILL GET 2N MIN-TERM PREDICATES, WHERE N IS THE NUMBER OF SIMPLE PREDICATES IN
THE GIVEN PREDICATE SET. FOR P1, WE HAVE 2 SIMPLE PREDICATES. HENCE, WE WILL GET 4
(22) POSSIBLE COMBINATIONS OF MIN-TERM PREDICATES AS FOLLOWS;
• M1 = {MARKS < 77 Λ MARKS ≥ 77}
• M2 = {MARKS < 77 Λ ¬(MARKS ≥ 77)}
• M3 = {¬(MARKS < 77) Λ MARKS ≥ 77}
• M4 = {¬(MARKS < 77) Λ ¬(MARKS ≥ 77)}
• OUR NEXT STEP IS TO CHOOSE THE MIN-TERM PREDICATES WHICH CAN SATISFY CERTAIN
CONDITIONS TO FRAGMENT A TABLE AND ELIMINATE THE OTHERS WHICH ARE NOT USEFUL. FOR
EXAMPLE, THE ABOVE SET OF MIN-TERM PREDICATES CAN BE APPLIED EACH AS A FORMULA FI
STATED IN THE ABOVE RULE FOR FRAGMENT RI AS FOLLOWS;
• STUDENT1 = ΣMARKS< 77 Λ MARKS ≥ 77(STUDENT)
• WHICH CAN BE WRITTEN IN EQUIVALENT SQL QUERY AS,
• STUDENT1
• SELECT * FROM STUDENT WHERE MARKS < 77 AND MARKS ≥ 77;
• STUDENT2 = ΣMARKS< 77 Λ ¬(MARKS ≥ 77)(STUDENT)
• WHICH CAN BE WRITTEN IN EQUIVALENT SQL QUERY AS,
• STUDENT2
• SELECT * FROM STUDENT WHERE MARKS < 77 AND NOT MARKS ≥ 77; WHERE
NOT MARKS ≥ 77 IS EQUIVALENT TO MARKS < 77.
• DERIVED HORIZENTAL FREGMENTATION:THE PROCESS OF CREATING
HORIZONTAL FRAGMENTS OF A TABLE IN QUESTION BASED ON THE ALREADY
CREATED HORIZONTAL FRAGMENTS OF ANOTHER RELATION (FOR EXAMPLE,
BASE TABLE) IS CALLED DERIVED HORIZONTAL FRAGMENTATION. ... FOR
EXAMPLE, CONSIDER A RELATION WHICH IS CONNECTED WITH ANOTHER
RELATION USING FOREIGN KEY CONCEPT
• CONSIDER AN EXAMPLE, WHERE AN ORGANIZATION MAINTAINS THE INFORMATION
ABOUT ITS CUSTOMERS.THEY STORE INFORMATION ABOUT THE CUSTOMER IN
CUSTOMER TABLE AND THE CUSTOMER ADDRESSES IN C_ADDRESS TABLE AS
FOLLOWS;
CUSTOMER(CID, CNAME, PROD_PURCHASED, SHOP_LOCATION)
• C_ADDRESS(CID, C_ADDRESS)
• THE TABLE CUSTOMER STORES INFORMATION ABOUT THE CUSTOMER, THE
PRODUCT PURCHASED FROM THEIR SHOP, AND THE SHOP LOCATION WHERE THE
PRODUCT IS PURCHASED. C_ADDRESS STORES INFORMATION ABOUT PERMANENT
AND PRESENT ADDRESSES OF THE CUSTOMER. HERE, CUSTOMER IS THE OWNER
RELATION AND C_ADDRESS IS THE MEMBER RELATION.
CID CNAME PROD_PURCHASED SHOP_LOCATION
C001 Ram Air Conditioner Mumbai
C002 Guru Television Chennai
C010 Murugan Television Coimbatore
C003 Yuvraj DVD Player Pune
C004 Gopinath Washing machine Coimbatore
CID C_ADDRESS
C001 Bandra, Mumbai
C001 XYZ, Pune
C002 T.Nagar, Chennai
C002 Kovil street, Madurai
C003 ABX, Pune
C004 Gandhipuram, Ooty
C004 North street, Erode
C010 Peelamedu, Coimbatore
• IF THE ORGANIZATION WOULD GO FOR FRAGMENTING THE RELATION
CUSTOMER ON THE SHOP_LOCATION ATTRIBUTE, IT NEEDS TO CREATE 4
FRAGMENTS USING HORIZONTAL FRAGMENTATION TECHNIQUE AS GIVEN IN
FIGURE 3 BELOW.
CID CNAME PROD_PURCHASED SHOP_LOCATION
C001 Ram Air Conditioner Mumbai
CUSTOMER1
CID CNAME PROD_PURCHASED SHOP_LOCATION
C002 Guru Television Chennai
CID CNAME PROD_PURCHASED SHOP_LOCATION
C010 Murugan Television Coimbatore
C004 Gopinath Washing machine Coimbatore
CUSTOMER3
CID CNAME PROD_PURCHASED SHOP_LOCATION
C003 Yuvraj DVD Player Pune
4
• NOW, IT IS NECESSARY TO FRAGMENT THE SECOND RELATION C_ADDRESS BASED ON THE
FRAGMENT CREATED ON CUSTOMER RELATION. BECAUSE, IN ANY OTHER WAY, IF WE
FRAGMENT THE RELATION C_ADDRESS, THEN IT MAY END IN DIFFERENT LOCATION FOR
DIFFERENT DATA. FOR EXAMPLE, IF C_ADDRESS IS FRAGMENTED ON THE LAST DIGIT OF THE
CID ATTRIBUTE, IT WILL END UP WITH MORE NUMBER OF FRAGMENTS AND THE DATA MAY
NOT BE STORED IN THE SAME LOCATION WHERE CUSTOMER INFORMATION ARE STORED. THAT
IS, CUSTOMER ‘RAM’ INFORMATION IS STORED IN MUMBAI AND HIS ADDRESS INFORMATION
MIGHT BE STORED SOMEWHERE ELSE. TO AVOID SUCH CONFUSION, THE TABLE C_ADDRESS
WHICH IS ACTUALLY A MEMBER TABLE OF CUSTOMER, MUST BE FRAGMENTED INTO FOUR
FRAGMENTS AND BASED ON THE CUSTOMER TABLE FRAGMENTS GIVEN IN FIGURE 3. THIS TYPE
OF FRAGMENTATION BASED ON OWNER RELATION IS CALLED DERIVED HORIZONTAL
FRAGMENTATION. THIS WILL WORK FOR RELATIONS WHERE AN EQUI-JOIN IS REQUIRED FOR
JOINING TWO RELATIONS. BECAUSE, AN EQUI-JOIN CAN BE REPRESENTED AS SET OF SEMI-
JOINS.
• THE FRAGMENTATION OF C_ADDRESS IS DONE AS FOLLOW AS SET OF SEMI-
JOINS AS FOLLOWS.
• C_ADDRESS1 = C_ADDRESS ⋉ CUSTOMER1
• C_ADDRESS2 = C_ADDRESS ⋉ CUSTOMER2
• C_ADDRESS3 = C_ADDRESS ⋉ CUSTOMER3
• C_ADDRESS4 = C_ADDRESS ⋉ CUSTOMER4
• THIS WILL RESULT IN FOUR FRAGMENTS OF C_ADDRESS WHERE THE CUSTOMER
ADDRESS OF ALL CUSTOMERS OF FRAGMENT CUSTOMER1 WILL GO INTO
C_ADDRESS1, AND THE CUSTOMER ADDRESS OF ALL CUSTOMERS OF FRAGMENT
CUSTOMER2 WILL GO INTO C_ADDRESS2, AND SO ON. THE RESULTANT
FRAGMENT OF C_ADDRESS WILL BE THE FOLLOWING.
•
• FIGURE 4: DERIVED HORIZONTAL FRAGMENTS OF FIGURE 2 AS A MEMBER
RELATION OF THE OWNER RELATION’S FRAGMENTS FROM FIGURE 3
CID C_ADDRESS
C001 Bandra, Mumbai
C001 XYZ, Pune
CID C_ADDRESS
C002 T.Nagar, Chennai
C002 Kovil street, Madurai
C_ADDRESS2
CID C_ADDRESS
C004 Gandhipuram, Ooty
C004 North street, Erode
C010 Peelamedu, Coimbatore
C_ADDRESS3
CID C_ADDRESS
C003 ABX, Pune
• CHECKING FOR CORRECTNESS
• COMPLETENESS: THE COMPLETENESS OF A DERIVED HORIZONTAL FRAGMENTATION IS MORE
DIFFICULT THAN PRIMARY HORIZONTAL FRAGMENTATION. BECAUSE, THE PREDICATES USED
ARE DETERMINING THE FRAGMENTATION OF TWO RELATIONS. FORMALLY, FOR
FRAGMENTATION OF TWO RELATIONS R AND S, SUCH AS {R1, R2, …, R3} AND {S1, S2, …, S3},
THERE SHOULD BE ONE COMMON ATTRIBUTE SUCH AS A. THEN, FOR EACH TUPLE T OF RI,
THERE SHOULD BE A TUPLE SI WHICH HAVE A COMMON VALUE FOR A. THIS IS KNOWN
AS REFERENTIAL INTEGRITY.
• THE DERIVED FRAGMENTATION OF C_ADDRESS IS COMPLETE. BECAUSE, THE VALUE OF THE
COMMON ATTRIBUTES CID FOR THE FRAGMENTS CUSTOMERI AND C_ADDRESSI ARE THE SAME.
FOR EXAMPLE, THE VALUE PRESENT IN CID OF CUSTOMER1 IS ALSO AND ONLY PRESENT IN
C_ADDRESS1, ETC.
• RECONSTRUCTION: RECONSTRUCTION OF A RELATION FROM ITS
FRAGMENTS IS PERFORMED BY THE UNION OPERATOR IN BOTH THE PRIMARY
AND THE DERIVED HORIZONTAL FRAGMENTATION
• RECONSTRUCTION: RECONSTRUCTION OF A RELATION FROM ITS
FRAGMENTS IS PERFORMED BY THE UNION OPERATOR IN BOTH THE PRIMARY
AND THE DERIVED HORIZONTAL FRAGMENTATION..
• 2)VERTICAL FREQMENTATION:VERTICAL FRAGMENTATION REFERS TO THE
PROCESS OF DECOMPOSING A TABLE VERTICALLY BY ATTRIBUTES ARE
COLUMNS. IN THIS FRAGMENTATION, SOME OF THE ATTRIBUTES ARE STORED IN
ONE SYSTEM AND THE REST ARE STORED IN OTHER SYSTEMS. THIS IS BECAUSE
EACH SITE MAY NOT NEED ALL COLUMNS OF A TABLE.
• =>EACH SITE MAY NOT ALL THE ATTRIBUTE OF RELATION.
• =>IT IS SUBSET OF A RELATION WHICH IS CREATED BY A SUBSET OF COLUMN.
• =>A VF OF A RELATION PRODUCE FREGMENTS R1,R2,R3….RN, EACH OF WHICH
CONTAIN SUBSET OF ATTRIBUTE OF R AND PRIMARY KEY OF R.
• =>RECONSTRACTION OF VF IS JOIN OPERATOR.
• =>IN ORDER TO TAKE CARE OF RESTORATION, EACH FRAGMENT MUST
CONTAIN THE PRIMARY KEY FIELD(S) IN A TABLE. THE FRAGMENTATION SHOULD
BE IN SUCH A MANNER THAT WE CAN REBUILD A TABLE FROM THE FRAGMENT
BY TAKING THE NATURAL JOIN OPERATION AND TO MAKE IT POSSIBLE WE NEED
TO INCLUDE A SPECIAL ATTRIBUTE CALLED TUPLE-ID TO THE SCHEMA. FOR
THIS PURPOSE, A USER CAN USE ANY SUPER KEY. AND BY THIS, THE TUPLES OR
ROWS CAN BE LINKED TOGETHER. THE PROJECTION IS AS FOLLOWS:
• FOR EXAMPLE, FOR THE EMPLOYEE TABLE WE HAVE T1 AS :
• ENO ENAME DESIGNTUPLE_ID
• 101 A ABC 1
• 102 B ABC 2
• 103 C ABC 3
• 104 D ABC 1
• 105 E ABC 4
• ΠA1, A2,…, AN (T)
• WHERE, Π IS RELATIONAL ALGEBRA OPERATOR
•
A1…., AN ARE THE AATRIUBUTES OF T
• T IS THE TABLE (RELATION)
• FOR THE SECOND. SUB TABLE OF RELATION AFTER VERTICAL FRAGMENTATION IS
GIVEN AS FOLLOWS :
• SALARY DEP TUPLE_ID
• 3000 1 1
• 4000 2 2
• 5500 3 3
• 5000 1 4
• 2000 4 5
• THIS IS T2 AND TO GET BACK TO THE ORIGINAL T, WE JOIN THESE TWO
FRAGMENTS T1 AND T2 AS ΠEMPLOYEE (T1 ⋈ T2)
• 3. MIXED FRAGMENTATION – THE COMBINATION OF VERTICAL FRAGMENTATION
OF A TABLE FOLLOWED BY FURTHER HORIZONTAL FRAGMENTATION OF SOME
FRAGMENTS IS CALLED MIXED OR HYBRID FRAGMENTATION. FOR DEFINING THIS
TYPE OF FRAGMENTATION WE USE THE SELECT AND THE PROJECT OPERATIONS
OF RELATIONAL ALGEBRA. IN SOME SITUATIONS, THE HORIZONTAL AND THE
VERTICAL FRAGMENTATION ISN’T ENOUGH TO DISTRIBUTE DATA FOR SOME
APPLICATIONS AND IN THAT CONDITIONS, WE NEED A FRAGMENTATION CALLED
A MIXED FRAGMENTATION.
• MIXED FRAGMENTATION CAN BE DONE IN TWO DIFFERENT WAYS:
• THE FIRST METHOD IS TO FIRST CREATE A SET OR GROUP OF HORIZONTAL
FRAGMENTS AND THEN CREATE VERTICAL FRAGMENTS FROM ONE OR MORE OF
THE HORIZONTAL FRAGMENTS.
• THE SECOND METHOD IS TO FIRST CREATE A SET OR GROUP OF VERTICAL
FRAGMENTS AND THEN CREATE HORIZONTAL FRAGMENTS FROM ONE OR MORE
OF THE VERTICAL FRAGMENTS.
THE ORIGINAL RELATION CAN BE OBTAINED BY THE COMBINATION OF JOIN AND
UNION OPERATIONS WHICH IS GIVEN AS FOLLOWS:
• ΣP(ΠA1, A2..,AN(T))
• ΠA1,A2….,AN (ΣP(T))
• FOR EXAMPLE, FOR OUR EMPLOYEE TABLE, BELOW IS THE IMPLEMENTATION OF
MIXED FRAGMENTATION IS ΠENAME, DESIGN (ΣENO > 102(EMPLOYEE))
• THE RESULT OF THIS FRAGMENTATION IS:
• ENAME DESIGN
• A ABC
• B ABC
• C ABC
• HYBRID FRAGMENTATION:
• IN HYBRID FRAGMENTATION, A COMBINATION OF HORIZONTAL AND VERTICAL
FRAGMENTATION TECHNIQUES ARE USED. THIS IS THE MOST FLEXIBLE
FRAGMENTATION TECHNIQUE SINCE IT GENERATES FRAGMENTS WITH MINIMAL
EXTRANEOUS INFORMATION. HOWEVER, RECONSTRUCTION OF THE ORIGINAL
TABLE IS OFTEN AN EXPENSIVE TASK.
•
• HYBRID FRAGMENTATION CAN BE DONE IN TWO ALTERNATIVE WAYS −
• AT FIRST, GENERATE A SET OF HORIZONTAL FRAGMENTS; THEN GENERATE
VERTICAL FRAGMENTS FROM ONE OR MORE OF THE HORIZONTAL FRAGMENTS.
• AT FIRST, GENERATE A SET OF VERTICAL FRAGMENTS; THEN GENERATE
HORIZONTAL FRAGMENTS FROM ONE OR MORE OF THE VERTICAL FRAGMENTS
• WHAT IS A TRANSACTION IN DISTRIBUTED DBMS:
• A PROGRAM THAT INCLUDES A COLLECTION OF DATABASE OPERATIONS WHICH
ARE EXECUTED AS A LOGICAL UNIT OF PROCESSING THE DATA IS KNOWN AS A
TRANSACTION. IN A TRANSACTION ONE OR MORE OF THE DATA OPERATIONS
ARE PERFORMED SUCH AS INSERT, UPDATE, DELETE OR RETRIEVE.
• TYPES :
• 1)LOCAL TRANSACTION
• 2)GLOBAL TRANSACTION
• 1) SOME SOFTWARE PLATFORMS DO NOT PROVIDE TRANSACTION
COORDINATION AS PART OF THE KERNEL OPERATING SYSTEM. WHEN,
INSTEAD, EACH RESOURCE MANAGER INVOLVED IS SEPERATELY COORDINATING
ITS OWN CHANGES, AND ONLY ITS CHANGES, THE TRANSACTION IS KNOWN AS
A LOCAL TRANSACTION. ...
• LOCAL TRANSACTION TO THE SITE IT HAS TWO TERM
• A)COORDINATING SITE:
• IT IS THE SITE WHERE THE TRANSACTION IS INITIATED.
• B)PARTICIPATING SITE:
• THESE ARE THE SITES WHERE SUBTRANSACTION ARE EXECUTED.
• 2)GLOBAL TRANSACTION:
• A GLOBAL TRANSACTION IS A MECHANISM THAT ALLOWS A SET OF
PROGRAMMING TASKS, POTENTIALLY USING MORE THAN ONE RESOURCE
MANAGER AND POTENTIALLY EXECUTING ON MULTIPLE SERVERS, TO BE
TREATED AS ONE LOGICAL UNIT. ... A GLOBAL TRANSACTION MAY BE
COMPOSED OF SEVERAL LOCAL TRANSACTIONS, EACH ACCESSING THE SAME
RESOURCE MANAGER.
• TRANSACTION MANAGER(TM):
• EACH SITE HAS ITS OWN TM.IT MANAGES THE EXECUTION OF THOSE
TRANSACTION OR SUBTRANSACTION THAT ACCESS DATA STORE IN THAT SITE.
• TRANSACTION COORDINATOR:
• IT IS PRESENT EACH SITE AND IS RESPONSIBLE FOR COORDINATING THE
EXECUTION OF ALL TRANSACTION INITIATED AT THAT SITE
• DISTRIBUTED TRANSACTION ARCHITECTURE:
• DISTRIBUTED TRANSACTION: IS A SET OF OPERATIONS ON DATA THAT IS
PERFORMED ACROSS TWO OR MORE DATA REPOSITORIES (ESPECIALLY
DATABASES). IT IS TYPICALLY COORDINATED ACROSS SEPARATE NODES
CONNECTED BY A NETWORK, BUT MAY ALSO SPAN MULTIPLE DATABASES ON A
SINGLE SERVER.
• ACID PROPERTIES:
• A TRANSACTION IS A SINGLE LOGICAL UNIT OF WORK WHICH ACCESSES AND
POSSIBLY MODIFIES THE CONTENTS OF A DATABASE. TRANSACTIONS ACCESS
DATA USING READ AND WRITE OPERATIONS.
IN ORDER TO MAINTAIN CONSISTENCY IN A DATABASE, BEFORE AND AFTER THE
TRANSACTION, CERTAIN PROPERTIES ARE FOLLOWED. THESE ARE
CALLED ACID PROPERTIES.
• ATOMICITY
BY THIS, WE MEAN THAT EITHER THE ENTIRE TRANSACTION TAKES PLACE AT
ONCE OR DOESN’T HAPPEN AT ALL. THERE IS NO MIDWAY I.E. TRANSACTIONS
DO NOT OCCUR PARTIALLY. EACH TRANSACTION IS CONSIDERED AS ONE UNIT
AND EITHER RUNS TO COMPLETION OR IS NOT EXECUTED AT ALL. IT INVOLVES
THE FOLLOWING TWO OPERATIONS.
—ABORT: IF A TRANSACTION ABORTS, CHANGES MADE TO DATABASE ARE NOT
VISIBLE.
—COMMIT: IF A TRANSACTION COMMITS, CHANGES MADE ARE VISIBLE.
ATOMICITY IS ALSO KNOWN AS THE ‘ALL OR NOTHING RULE’.
• CONSIDER THE FOLLOWING TRANSACTION T CONSISTING OF T1 AND T2:
TRANSFER OF 100 FROM ACCOUNT X TO ACCOUNT Y.
•
• IF THE TRANSACTION FAILS AFTER COMPLETION OF T1 BUT BEFORE
COMPLETION OF T2.( SAY, AFTER WRITE(X) BUT BEFORE WRITE(Y)), THEN
AMOUNT HAS BEEN DEDUCTED FROM X BUT NOT ADDED TO Y. THIS RESULTS IN
AN INCONSISTENT DATABASE STATE. THEREFORE, THE TRANSACTION MUST BE
EXECUTED IN ENTIRETY IN ORDER TO ENSURE CORRECTNESS OF DATABASE
STATE.
• CONSISTENCY
THIS MEANS THAT INTEGRITY CONSTRAINTS MUST BE MAINTAINED SO THAT
THE DATABASE IS CONSISTENT BEFORE AND AFTER THE TRANSACTION. IT
REFERS TO THE CORRECTNESS OF A DATABASE. REFERRING TO THE EXAMPLE
ABOVE,
THE TOTAL AMOUNT BEFORE AND AFTER THE TRANSACTION MUST BE
MAINTAINED.
TOTAL BEFORE T OCCURS = 500 + 200 = 700.
TOTAL AFTER T OCCURS = 400 + 300 = 700.
THEREFORE, DATABASE IS CONSISTENT. INCONSISTENCY OCCURS IN
CASE T1 COMPLETES BUT T2 FAILS. AS A RESULT T IS INCOMPLETE.
• ISOLATION
THIS PROPERTY ENSURES THAT MULTIPLE TRANSACTIONS CAN OCCUR
CONCURRENTLY WITHOUT LEADING TO THE INCONSISTENCY OF DATABASE
STATE. TRANSACTIONS OCCUR INDEPENDENTLY WITHOUT INTERFERENCE.
CHANGES OCCURRING IN A PARTICULAR TRANSACTION WILL NOT BE VISIBLE TO
ANY OTHER TRANSACTION UNTIL THAT PARTICULAR CHANGE IN THAT
TRANSACTION IS WRITTEN TO MEMORY OR HAS BEEN COMMITTED. THIS
PROPERTY ENSURES THAT THE EXECUTION OF TRANSACTIONS CONCURRENTLY
WILL RESULT IN A STATE THAT IS EQUIVALENT TO A STATE ACHIEVED THESE
WERE EXECUTED SERIALLY IN SOME ORDER.
• LET X= 500, Y = 500.
CONSIDER TWO TRANSACTIONS T AND T”.
• SUPPOSE T HAS BEEN EXECUTED TILL READ (Y) AND THEN T’’ STARTS. AS A
RESULT , INTERLEAVING OF OPERATIONS TAKES PLACE DUE TO WHICH T’’ READS
CORRECT VALUE OF X BUT INCORRECT VALUE OF Y AND SUM COMPUTED BY
T’’: (X+Y = 50, 000+500=50, 500)
IS THUS NOT CONSISTENT WITH THE SUM AT END OF TRANSACTION:
T: (X+Y = 50, 000 + 450 = 50, 450).
THIS RESULTS IN DATABASE INCONSISTENCY, DUE TO A LOSS OF 50 UNITS.
HENCE, TRANSACTIONS MUST TAKE PLACE IN ISOLATION AND CHANGES
SHOULD BE VISIBLE ONLY AFTER THEY HAVE BEEN MADE TO THE MAIN
MEMORY.
• DURABILITY:
THIS PROPERTY ENSURES THAT ONCE THE TRANSACTION HAS COMPLETED
EXECUTION, THE UPDATES AND MODIFICATIONS TO THE DATABASE ARE
STORED IN AND WRITTEN TO DISK AND THEY PERSIST EVEN IF A SYSTEM
FAILURE OCCURS. THESE UPDATES NOW BECOME PERMANENT AND ARE STORED
IN NON-VOLATILE MEMORY. THE EFFECTS OF THE TRANSACTION, THUS, ARE
NEVER LOST.
• THE ACID PROPERTIES, IN TOTALITY, PROVIDE A MECHANISM TO ENSURE
CORRECTNESS AND CONSISTENCY OF A DATABASE IN A WAY SUCH THAT EACH
TRANSACTION IS A GROUP OF OPERATIONS THAT ACTS A SINGLE UNIT,
PRODUCES CONSISTENT RESULTS, ACTS IN ISOLATION FROM OTHER
OPERATIONS AND UPDATES THAT IT MAKES ARE DURABLY STORED.
• CONCURRENCY CONTROL:
• IT IS THE PROCESS OF MANAGING SIMULTANEOUS EXECUTION OF TRANSACTION IN
A SHARED DB.
• CONCURRENCY CONTROL IS PROVIDED IN A DATABASE TO:
• (I) ENFORCE ISOLATION AMONG TRANSACTIONS.
• (II) PRESERVE DATABASE CONSISTENCY THROUGH CONSISTENCY PRESERVING
EXECUTION OF TRANSACTIONS.
• (III) RESOLVE READ-WRITE AND WRITE-READ CONFLICTS.
• LOCK BASED PROTOCOLS –
A LOCK IS A VARIABLE ASSOCIATED WITH A DATA ITEM THAT DESCRIBES A
STATUS OF DATA ITEM WITH RESPECT TO POSSIBLE OPERATION THAT CAN BE
APPLIED TO IT. THEY SYNCHRONIZE THE ACCESS BY CONCURRENT
TRANSACTIONS TO THE DATABASE ITEMS. IT IS REQUIRED IN THIS PROTOCOL
THAT ALL THE DATA ITEMS MUST BE ACCESSED IN A MUTUALLY EXCLUSIVE
MANNER. OR A LOCK GUARANTEE EXCLUSIVE USE OF DATA ITEM TO A CURRENT
TRANSACTION.
• =>ACCESS THE DATA ITEM(LOCK ACQUIRE)
• =>AFTER COMPLETION OF TRANSACTION(RELEASE)
• 1) SHARED LOCK (S): ALSO KNOWN AS READ-ONLY LOCK. AS THE NAME
SUGGESTS IT CAN BE SHARED BETWEEN TRANSACTIONS BECAUSE WHILE
HOLDING THIS LOCK THE TRANSACTION DOES NOT HAVE THE PERMISSION TO
UPDATE DATA ON THE DATA ITEM. S-LOCK IS REQUESTED USING LOCK-S
INSTRUCTION.
• 2) EXCLUSIVE LOCK (X): DATA ITEM CAN BE BOTH READ AS WELL AS
WRITTEN.THIS IS EXCLUSIVE AND CANNOT BE HELD SIMULTANEOUSLY ON THE
SAME DATA ITEM. X-LOCK IS REQUESTED USING LOCK-X INSTRUCTION.
• LOCK COMPATIBILITY MATRIX –
• CONVERSION OF LOCK:
=> HERE WE CAN SEE THAT ACQUIRING LOCKS ON DATA HAPPENS IN FIRST PHASE
AND RELEASING LOCKS ON DATA HAPPENS IN SECOND PHASE. CONVERTING
LOCKS FROM SHARED TO EXCLUSIVE (UPGRADING) CAN BE DONE IN FIRST PHASE
ALONE WHILE CONVERTING FROM EXCLUSIVE TO SHARED (DOWNGRADING) IS
DONE IN RELEASE PHASE
• TWO PHASE COMMIT PROTOCOLS:
• COMMIT REQUEST OR VOTING PHASE.
• THE COORDINATOR SENDS A QUERY TO COMMIT MESSAGE TO ALL
PARTICIPANTS AND WAITS UNTIL IT HAS RECEIVED A REPLY FROM ALL
PARTICIPANTS.
• THE PARTICIPANTS EXECUTE THE TRANSACTION UP TO THE POINT WHERE THEY
WILL BE ASKED TO COMMIT. THEY EACH WRITE AN ENTRY TO THEIR UNDO LOG
AND AN ENTRY TO THEIR .
• EACH PARTICIPANT REPLIES WITH AN AGREEMENT MESSAGE (PARTICIPANT
VOTES YES TO COMMIT), IF THE PARTICIPANT'S ACTIONS SUCCEEDED, OR AN
ABORT MESSAGE (PARTICIPANT VOTES NO, NOT TO COMMIT), IF THE
PARTICIPANT EXPERIENCES A FAILURE THAT WILL MAKE IT IMPOSSIBLE TO
COMMIT.
• COMMIT OR COMPLETION PHASE OR DECISION PHASE:
• THE COORDINATOR SENDS A COMMIT MESSAGE TO ALL THE PARTICIPANTS.
• EACH PARTICIPANT COMPLETES THE OPERATION, AND RELEASES ALL THE LOCKS
AND RESOURCES HELD DURING THE TRANSACTION.
• EACH PARTICIPANT SENDS AN ACKNOWLEDGEMENT TO THE COORDINATOR.
• THE COORDINATOR COMPLETES THE TRANSACTION WHEN ALL
ACKNOWLEDGMENTS HAVE BEEN RECEIVED.
TWO RULES:
1) [READY,T] MESSAGE FROM ALL SITE.THE TRANSACTION WILL BE COMMIT.
2) IF AT LEAST ONE[NOT READY,T] MESSAGE FROM A SITE THE TRANSACTION WILL
BE ABORT.
• RECOVERY IN DDB
• POSSIBLE ERROR:
• 1)LOSS OF MESSAGE.
• 2)FAILURE OF SITE AT WHICH TRANSACTION IS RUNNING.
• 3)COMMUNICATION LINK DOWN.
• FAILURE OF PARTICIPATING SITE:
• 1)SITE FAIL BEFORE SEND [T,READY].
• 2)SITE FAIL AFTER SEND [T,READY].
• HANDLING A FAILURE OF A PARTICIPATING SITE:
• 1. THE RESPONSE OF THE TRANSACTION COORDINATOR OF TRANSACTION T.
• IF THE FAILED SITE HAVE NOT SENT ANY <READY T> MESSAGE, THE TC
CANNOT DECIDE TO COMMIT THE TRANSACTION [REMEMBER, IN DISTRIBUTED
DATABASE ALL THE PARTICIPATING SITES MUST BE READY TO COMMIT. EVEN IF,
ONE SITE IS NOT READY, THEN THE WHOLE TRANSACTION NEEDS TO BE
ABORTED BY THE TC]. HENCE, THE TRANSACTION T SHOULD BE ABORTED AND
OTHER PARTICIPATING SITES TO BE INFORMED.
• 2. THE RESPONSE OF THE FAILED SITE WHEN IT RECOVERS.
• WHEN RECOVER FROM FAILURE, THE RECOVERING SITE SI MUST IDENTIFY THE
FATE OF THE TRANSACTIONS WHICH WERE GOING ON DURING THE FAILURE OF
SI. THIS CAN BE DONE BY EXAMINING THE LOG FILE ENTRIES OF SITE SI.
• FAILURE OF COORDINATING SITE:
• 1) AFTER THE COORDINATOR WRITES THE PREPARE LOG RECORD AND BEFORE
THE GATEWAY COMMIT PHASE.
• 2) AFTER THE COORDINATOR SENDS A COMMIT MESSAGE TO THE GATEWAY BUT
BEFORE IT RECEIVES A REPLY.
• 3) AFTER GATEWAY COMMIT PHASE BUT BEFORE THE COORDINATOR WRITES A
COMMIT RECORD TO THE LOGICAL LOG
• HANDLING THE FAILURE OF A COORDINATOR SITE
• LET US SUPPOSE THAT THE COORDINATOR SITE FAILED DURING EXECUTION OF 2
PHASE COMMIT (2PC) PROTOCOL FOR A TRANSACTION T. THIS SITUATION CAN BE
HANDLED IN TWO WAYS;
•
THE OTHER SITES WHICH ARE PARTICIPATING IN THE TRANSACTION T MAY TRY TO
DECIDE THE FATE OF THE TRANSACTION. THAT IS, THEY MAY TRY TO DECIDE ON
COMMIT OR ABORT OF T USING THE CONTROL MESSAGES AVAILABLE IN EVERY SITE.
•
THE SECOND WAY IS TO WAIT UNTIL THE COORDINATOR SITE RECOVERS.
• QUERY PROCESSING IN DISTRIBUTED DBMS
• A QUERY PROCESSING IN A DISTRIBUTED DATABASE MANAGEMENT
SYSTEM REQUIRES THE TRANSMISSION OF DATA BETWEEN THE COMPUTERS IN A
NETWORK. A DISTRIBUTION STRATEGY FOR A QUERY IS THE ORDERING OF DATA
TRANSMISSIONS AND LOCAL DATA PROCESSING IN A DATABASE SYSTEM.
GENERALLY, A QUERY IN DISTRIBUTED DBMS REQUIRES DATA FROM MULTIPLE SITES,
AND THIS NEED FOR DATA FROM DIFFERENT SITES IS CALLED THE TRANSMISSION OF
DATA THAT CAUSES COMMUNICATION COSTS. QUERY PROCESSING IN DBMS IS
DIFFERENT FROM QUERY PROCESSING IN CENTRALIZED DBMS DUE TO THIS
COMMUNICATION COST OF DATA TRANSFER OVER THE NETWORK. THE
TRANSMISSION COST IS LOW WHEN SITES ARE CONNECTED THROUGH HIGH-SPEED
NETWORKS AND IS QUITE SIGNIFICANT IN OTHER NETWORKS.
• 1. COSTS (TRANSFER OF DATA) OF DISTRIBUTED QUERY PROCESSING :
• IN DISTRIBUTED QUERY PROCESSING, THE DATA TRANSFER COST OF
DISTRIBUTED QUERY PROCESSING MEANS THE COST OF TRANSFERRING
INTERMEDIATE FILES TO OTHER SITES FOR PROCESSING AND THEREFORE THE
COST OF TRANSFERRING THE ULTIMATE RESULT FILES TO THE LOCATION WHERE
THAT RESULT’S REQUIRED. LET’S SAY THAT A USER SENDS A QUERY TO SITE S1,
WHICH REQUIRES DATA FROM ITS OWN AND ALSO FROM ANOTHER SITE S2.
NOW, THERE ARE THREE STRATEGIES TO PROCESS THIS QUERY WHICH ARE
GIVEN BELOW:
• 1) WE CAN TRANSFER THE DATA FROM S2 TO S1 AND THEN PROCESS THE
QUERY
• 2) WE CAN TRANSFER THE DATA FROM S1 TO S2 AND THEN PROCESS THE
QUERY
• 3) WE CAN TRANSFER THE DATA FROM S1 AND S2 TO S3 AND THEN PROCESS
THE QUERY. SO THE CHOICE DEPENDS ON VARIOUS FACTORS LIKE, THE SIZE OF
RELATIONS AND THE RESULTS, THE COMMUNICATION COST BETWEEN
DIFFERENT SITES, AND AT WHICH THE SITE RESULT WILL BE UTILIZED.
COMMONLY, THE DATA TRANSFER COST IS CALCULATED IN TERMS OF THE SIZE
OF THE MESSAGES. BY USING THE BELOW FORMULA, WE CAN CALCULATE THE
DATA TRANSFER COST:
DATA TRANSFER COST = C * SIZE
WHERE C REFERS TO THE COST PER BYTE OF DATA TRANSFERRING AND SIZE IS
THE NO. OF BYTES TRANSMITTED.
• EXAMPLE: CONSIDER THE FOLLOWING TABLE EMPLOYEE AND DEPARTMENT
• SITE1: EMPLOYEE
• EID NAME SALARY DID
• EID- 10 BYTES
SALARY- 20 BYTES
DID- 10 BYTES
NAME- 20 BYTES
TOTAL RECORDS- 1000
RECORD SIZE- 60 BYTES
• SITE2: DEPARTMENT
DID DNAME
DID – 10 BYTES
DNAME-20BYTES
TOTAL RECORDS-50
RECORD SIZE -30 BYTES
• EXAMPLE : FIND THE NAME OF EMPLOYEES AND THEIR DEPARTMENT NAMES. ALSO, FIND THE
AMOUNT OF DATA TRANSFER TO EXECUTE THIS QUERY WHEN THE QUERY IS SUBMITTED TO
SITE 3.
• ANSWER : CONSIDERING THE QUERY IS SUBMITTED AT SITE 3 AND NEITHER OF THE TWO
RELATIONS THAT IS AN EMPLOYEE AND THE DEPARTMENT NOT AVAILABLE AT SITE 3. SO, TO
EXECUTE THIS QUERY, WE HAVE THREE STRATEGIES:
• 1) TRANSFER BOTH THE TABLES THAT IS EMPLOYEE AND DEPARTMENT AT SITE 3 THEN JOIN
THE TABLES THERE. THE TOTAL COST IN THIS IS 1000 * 60 + 50 * 30 = 60,000 + 1500 =
61500 BYTES.
• 2) TRANSFER THE TABLE EMPLOYEE TO SITE 2, JOIN THE TABLE AT SITE 2 AND THEN TRANSFER
THE RESULT AT SITE 3. THE TOTAL COST IN THIS IS 60 * 1000 + 60 * 1000 = 120000 BYTES
SINCE WE HAVE TO TRANSFER 1000 TUPLES HAVING NAME AND DNAME FROM SITE 1,
• 3) TRANSFER THE TABLE DEPARTMENT TO SITE 1, JOIN THE TABLE AT SITE 2
JOIN THE TABLE AT SITE1 AND THEN TRANSFER THE RESULT AT SITE3. THE
TOTAL COST IS 30 * 50 + 60 * 1000 = 61500 BYTES SINCE WE HAVE TO
TRANSFER 1000 TUPLES HAVING NAME AND DNAME FROM SITE 1 TO SITE 3
THAT IS 60 BYTES EACH.
• NOW, IF THE OPTIMISATION CRITERIA ARE TO REDUCE THE AMOUNT OF DATA
TRANSFER, WE CAN CHOOSE EITHER 1 OR 3 STRATEGIES FROM THE ABOVE.
•
• 2. USING SEMI JOIN IN DISTRIBUTED QUERY PROCESSING :
• THE SEMI-JOIN OPERATION IS USED IN DISTRIBUTED QUERY PROCESSING TO REDUCE
THE NUMBER OF TUPLES IN A TABLE BEFORE TRANSMITTING IT TO ANOTHER SITE.
THIS REDUCTION IN THE NUMBER OF TUPLES REDUCES THE NUMBER AND THE TOTAL
SIZE OF THE TRANSMISSION THAT ULTIMATELY REDUCING THE TOTAL COST OF
DATA TRANSFER. LET’S SAY THAT WE HAVE TWO TABLES R1, R2 ON SITE S1, AND S2.
NOW, WE WILL FORWARD THE JOINING COLUMN OF ONE TABLE SAY R1 TO THE SITE
WHERE THE OTHER TABLE SAY R2 IS LOCATED. THIS COLUMN IS JOINED WITH R2 AT
THAT SITE. THE DECISION WHETHER TO REDUCE R1 OR R2 CAN ONLY BE MADE
AFTER COMPARING THE ADVANTAGES OF REDUCING R1 WITH THAT OF REDUCING
R2. THUS, SEMI-JOIN IS A WELL-ORGANIZED SOLUTION TO REDUCE THE TRANSFER
OF DATA IN DISTRIBUTED QUERY PROCESSING.
• EXAMPLE : FIND THE AMOUNT OF DATA TRANSFERRED TO EXECUTE THE SAME QUERY GIVEN
IN THE ABOVE EXAMPLE USING SEMI-JOIN OPERATION.
• ANSWER : THE FOLLOWING STRATEGY CAN BE USED TO EXECUTE THE QUERY.
• SELECT ALL (OR PROJECT) THE ATTRIBUTES OF THE EMPLOYEE TABLE AT SITE 1 AND THEN
TRANSFER THEM TO SITE 3. FOR THIS, WE WILL TRANSFER NAME, DID(EMPLOYEE) AND THE
SIZE IS 25 * 1000 = 25000 BYTES.
• TRANSFER THE TABLE DEPARTMENT TO SITE 3 AND JOIN THE PROJECTED ATTRIBUTES OF
EMPLOYEE WITH THIS TABLE. THE SIZE OF THE DEPARTMENT TABLE IS 25 * 50 = 1250
• APPLYING THE ABOVE SCHEME, THE AMOUNT OF DATA TRANSFERRED TO EXECUTE THE QUERY
WILL BE 25000 + 1250 = 26250 BYTES.
• DISTRIBUTED SECURITY MODEL
• => IT HELPS IN SECURITY PROCESS AND CHANNEL FROM AUTHORIZED ACCESS.
• 1) PROTECTING OBJECT:
• SPECIFIES WHO IS ALLOWED TO PERFORM ACTION ON THE OBJECT.
• 2) THREAD TO PROCESS:
• IN BETWEEN CLIENT AND SERVER HACKER ACCESS SERVER.
• 3)THREAD TO CHANNEL COMMUNICATION:
• => MALICIOUS USER CAN COPY ALTER INJECT MESSAGE ON CHANNEL.
• =>

More Related Content

Similar to Distributed Data Base.pptx

Networking for MBA
Networking for MBANetworking for MBA
Networking for MBA
KK Bajpai
 
CP 121_2.pptx about time to be implement
CP 121_2.pptx about time to be implementCP 121_2.pptx about time to be implement
CP 121_2.pptx about time to be implement
flyinimohamed
 
Information Storage and Management
Information Storage and Management Information Storage and Management
Information Storage and Management
AngelineR
 

Similar to Distributed Data Base.pptx (20)

2 ddb architecture
2 ddb architecture2 ddb architecture
2 ddb architecture
 
Lecture 9.pptx
Lecture 9.pptxLecture 9.pptx
Lecture 9.pptx
 
Chapter 1
Chapter 1Chapter 1
Chapter 1
 
Distributed database management system
Distributed database management systemDistributed database management system
Distributed database management system
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed System
 
Lecture 1 distriubted computing
Lecture 1 distriubted computingLecture 1 distriubted computing
Lecture 1 distriubted computing
 
Reactive Systems with Data Distribution Service (DDS)
Reactive Systems with Data Distribution Service (DDS)Reactive Systems with Data Distribution Service (DDS)
Reactive Systems with Data Distribution Service (DDS)
 
3. challenges
3. challenges3. challenges
3. challenges
 
Chapeter 2 introduction to cloud computing
Chapeter 2   introduction to cloud computingChapeter 2   introduction to cloud computing
Chapeter 2 introduction to cloud computing
 
Networking for MBA
Networking for MBANetworking for MBA
Networking for MBA
 
Distributed Computing system
Distributed Computing system Distributed Computing system
Distributed Computing system
 
CP 121_2.pptx about time to be implement
CP 121_2.pptx about time to be implementCP 121_2.pptx about time to be implement
CP 121_2.pptx about time to be implement
 
distributed system chapter one introduction to distribued system.pdf
distributed system chapter one introduction to distribued system.pdfdistributed system chapter one introduction to distribued system.pdf
distributed system chapter one introduction to distribued system.pdf
 
DISTRIBUTED SYSTEM.docx
DISTRIBUTED SYSTEM.docxDISTRIBUTED SYSTEM.docx
DISTRIBUTED SYSTEM.docx
 
distributed system original.pdf
distributed system original.pdfdistributed system original.pdf
distributed system original.pdf
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptx
 
A Comprehensive Study On Data Mining Process With Distribution
A Comprehensive Study On Data Mining Process With DistributionA Comprehensive Study On Data Mining Process With Distribution
A Comprehensive Study On Data Mining Process With Distribution
 
Information Storage and Management
Information Storage and Management Information Storage and Management
Information Storage and Management
 
Distributed Systems.pptx
Distributed Systems.pptxDistributed Systems.pptx
Distributed Systems.pptx
 
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
 

Recently uploaded

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Recently uploaded (20)

Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 

Distributed Data Base.pptx

  • 2. • A CENTRALIZED DATABASE (SOMETIMES ABBREVIATED CDB) IS A DATABASE THAT IS LOCATED, STORED, AND MAINTAINED IN A SINGLE LOCATION. ... USERS ACCESS A CENTRALIZED DATABASE THROUGH A COMPUTER NETWORK WHICH IS ABLE TO GIVE THEM ACCESS TO THE CENTRAL CPU, WHICH IN TURN MAINTAINS TO THE DATABASE ITSELF.
  • 3. • LIST OF THE ADVANTAGES OF A CENTRALIZED DATABASE • IT ALLOWS FOR WORKING ON CROSS-FUNCTIONAL PROJECTS. ... • IT IS EASIER TO SHARE IDEAS ACROSS ANALYSTS. ... • ANALYSTS CAN BE ASSIGNED TO SPECIFIC PROBLEMS OR PROJECTS CENTRALLY. ... • HIGHER LEVELS OF SECURITY CAN BE OBTAINED. ... • HIGHER LEVELS OF DEPENDABILITY ARE PRESENT WITHIN THE SYSTEM.
  • 4. • DISADVANTAGES • CENTRALIZED DATABASES ARE HIGHLY DEPENDENT ON NETWORK CONNECTIVITY. ... • BOTTLENECKS CAN OCCUR AS A RESULT OF HIGH TRAFFIC. • LIMITED ACCESS BY MORE THAN ONE PERSON TO THE SAME SET OF DATA AS THERE IS ONLY ONE COPY OF IT AND IT IS MAINTAINED IN A SINGLE LOCATION.
  • 5. • DDB: A DISTRIBUTED DATABASE (DDB) IS A COLLECTION OF MULTIPLE, LOGICALLY INTERRELATED DATABASES DISTRIBUTED OVER A COMPUTER NETWORK. A DISTRIBUTED DATABASE MANAGEMENT SYSTEM (D–DBMS) IS THE SOFTWARE THAT MANAGES THE DDB AND PROVIDES AN ACCESS MECHANISM THAT MAKES THIS DISTRIBUTION TRANSPARENT TO THE USERS. DISTRIBUTED DATABASE SYSTEM (DDBS) = DDB + D–DBMS
  • 6. • ADVANTAGES OF DISTRIBUTED DATABASE SYSTEM • RELIABLE • IN DISTRIBUTED DATABASE MANAGEMENT SYSTEM, IF ANY CONNECTED SYSTEM FAILS TO DO WORK THEN THERE IS NO EFFECT ON THE PERFORMANCE OF THE SYSTEM. IT CONTINUES FUNCTIONING AND IT IS MORE RELIABLE THAN OTHER SIMPLE DATABASE MANAGEMENT SYSTEM. • LOW COMMUNICATION COST • DATA AND INFORMATION IS STORED LOCALLY IN DISTRIBUTED DATABASE MANAGEMENT SYSTEM. ITS COMMUNICATION COST AND DATA MANIPULATION BECOME EASY AND LESS COSTLY. • MODULAR DEVELOPMENT • MODULATION IN DISTRIBUTED DATABASE MANAGEMENT SYSTEM IS SO EASY. MORE SYSTEMS CAN BE MANIPULATED AND INSTALLED BY JUST INSTALLING AND CONNECTING WITH THE DISTRIBUTED DATABASE SYSTEM WITH NO INTERRUPTION AND FAILURE.
  • 7. • DATA RECOVERY • DATA CAN BE EASILY RECOVERED IN DISTRIBUTED DATABASE MANAGEMENT SYSTEMS.
  • 8. • DISADVANTAGES OF DISTRIBUTED DATABASE SYSTEM • DATA INTEGRITY • DATA IS UPDATED ON MULTIPLE SITES CAN CAUSE PROBLEMS. DATA INTEGRITY IS MORE COMPLEX AND VERY HARD TO HANDLE. • DUPLICATION OF DATA • SAME TYPE OF DATA IS STORED IN DIFFERENT SYSTEMS MAKE DUPLICATION OF DATA. IT TAKES MUCH SPACE TO STORE THE SAME DATA IN DIFFERENT COMPUTER SYSTEMS IN DISTRIBUTED DATABASE MANAGEMENT SYSTEMS.
  • 9. • IMPROPER DATA DISTRIBUTION • IMPROPER DATA DISTRIBUTION CAN LEAD TO SLOW RESPONSE IN PROCESSING OF QUERY. SAME DATA IS STORED IN DIFFERENT COMPUTERS CAN CREATE MORE PROBLEMS IN DISTRIBUTED DATABASE MANAGEMENT SYSTEMS. • LESS PROCESSING SPEED • MUCH COMMUNICATION IS NEEDED TO A SIMPLE QUERY TO PERFORM. IN THIS REASON AMPLE TIME PERIOD IS REQUIRED TO SOLVE A SPECIFIC PROBLEM. • SECURITY PROBLEM.
  • 10. • DESIGN ISSUES OF DISTRIBUTED SYSTEM : • 1) HETEROGENEITY : HETEROGENEITY IS APPLIED TO THE NETWORK, COMPUTER HARDWARE, OPERATING SYSTEM AND IMPLEMENTATION OF DIFFERENT DEVELOPERS. A KEY COMPONENT OF THE HETEROGENEOUS DISTRIBUTED SYSTEM CLIENT-SERVER ENVIRONMENT IS MIDDLEWARE. MIDDLEWARE IS A SET OF SERVICES THAT ENABLES APPLICATION AND END-USER TO INTERACTS WITH EACH OTHER ACROSS A HETEROGENEOUS DISTRIBUTED SYSTEM.
  • 11. • 2) OPENNESS: THE OPENNESS OF THE DISTRIBUTED SYSTEM IS DETERMINED PRIMARILY BY THE DEGREE TO WHICH NEW RESOURCE-SHARING SERVICES CAN BE MADE AVAILABLE TO THE USERS. OPEN SYSTEMS ARE CHARACTERIZED BY THE FACT THAT THEIR KEY INTERFACES ARE PUBLISHED. IT IS BASED ON A UNIFORM COMMUNICATION MECHANISM AND PUBLISHED INTERFACE FOR ACCESS TO SHARED RESOURCES. IT CAN BE CONSTRUCTED FROM HETEROGENEOUS HARDWARE AND SOFTWARE.
  • 12. • 3) SCALABILITY: SCALABILITY OF THE SYSTEM SHOULD REMAIN EFFICIENT EVEN WITH A SIGNIFICANT INCREASE IN THE NUMBER OF USERS AND RESOURCES CONNECTED. • 4) SECURITY : SECURITY OF INFORMATION SYSTEM HAS THREE COMPONENTS CONFIDENTIALLY, INTEGRITY AND AVAILABILITY. ENCRYPTION PROTECTS SHARED RESOURCES, KEEPS SENSITIVE INFORMATION SECRETS WHEN TRANSMITTED.
  • 13. • 5) TRANSPARENCY : TRANSPARENCY ENSURES THAT THE DISTRIBUTES SYSTEM SHOULD BE PERCEIVED AS A SINGLE ENTITY BY THE USERS OR THE APPLICATION PROGRAMMERS RATHER THAN THE COLLECTION OF AUTONOMOUS SYSTEMS, WHICH IS COOPERATING. THE USER SHOULD BE UNAWARE OF WHERE THE SERVICES ARE LOCATED AND THE TRANSFERRING FROM A LOCAL MACHINE TO A REMOTE ONE SHOULD BE TRANSPARENT.
  • 14. • 7) CONCURRENCY: THERE IS A POSSIBILITY THAT SEVERAL CLIENTS WILL ATTEMPT TO ACCESS A SHARED RESOURCE AT THE SAME TIME. MULTIPLE USERS MAKE REQUESTS ON THE SAME RESOURCES, I.E READ, WRITE, AND UPDATE. EACH RESOURCE MUST BE SAFE IN A CONCURRENT ENVIRONMENT. ANY OBJECT THAT REPRESENTS A SHARED RESOURCE A DISTRIBUTED SYSTEM MUST ENSURE THAT IT OPERATES CORRECTLY IN A CONCURRENT ENVIRONMENT.
  • 15. • TYPES OF TRANSPARENCY: • 1)LOCATION TRANSPARENCY: • LOCATION TRANSPARENCY ENSURES THAT THE USER CAN QUERY ON ANY TABLE(S) OR FRAGMENT(S) OF A TABLE AS IF THEY WERE STORED LOCALLY IN THE USER'S SITE. THE FACT THAT THE TABLE OR ITS FRAGMENTS ARE STORED AT REMOTE SITE IN THE DISTRIBUTED DATABASE SYSTEM, SHOULD BE COMPLETELY OBLIVIOUS TO THE END USER.
  • 16. • 2)REPLICATION TRANSPARENCE: • REPLICATION TRANSPARENCY ENSURES THAT REPLICATION OF DATABASES ARE HIDDEN FROM THE USERS. IT ENABLES USERS TO QUERY UPON A TABLE AS IF ONLY A SINGLE COPY OF THE TABLE EXISTS. ... ALSO, IN CASE OF FAILURE OF A SITE, THE USER CAN STILL PROCEED WITH HIS QUERIES USING REPLICATED COPIES WITHOUT ANY KNOWLEDGE OF FAILURE.
  • 17. • NAMING TRANSPARENCY: • A TRANSPARENCY IS SOME ASPECT OF THE DISTRIBUTED SYSTEM THAT IS HIDDEN FROM THE USER (PROGRAMMER, SYSTEM DEVELOPER, USER OR APPLICATION PROGRAM). A TRANSPARENCY IS PROVIDED BY INCLUDING SOME SET OF MECHANISMS IN THE DISTRIBUTED SYSTEM AT A LAYER BELOW THE INTERFACE WHERE THE TRANSPARENCY IS REQUIRED.
  • 18. • A RELATIONAL DATABASE IS A TYPE OF DATABASE THAT STORES AND PROVIDES ACCESS TO DATA POINTS THAT ARE RELATED TO ONE ANOTHER. ... THE COLUMNS OF THE TABLE HOLD ATTRIBUTES OF THE DATA, AND EACH RECORD USUALLY HAS A VALUE FOR EACH ATTRIBUTE, MAKING IT EASY TO ESTABLISH THE RELATIONSHIPS AMONG DATA POINTS. • PROPERTIES OF RELATIONAL DATABASES • VALUES ARE ATOMIC. • ALL OF THE VALUES IN A COLUMN HAVE THE SAME DATA TYPE. • EACH ROW IS UNIQUE. • THE SEQUENCE OF COLUMNS IS INSIGNIFICANT.
  • 19. • THE SEQUENCE OF ROWS IS INSIGNIFICANT. • EACH COLUMN HAS A UNIQUE NAME. • INTEGRITY CONSTRAINTS MAINTAIN DATA CONSISTENCY ACROSS MULTIPLE TABLES.
  • 20. • CLIENT-SERVER ARCHITECTURE IS A COMPUTING MODEL IN WHICH THE SERVER HOSTS, DELIVERS AND MANAGES MOST OF THE RESOURCES AND SERVICES TO BE CONSUMED BY THE CLIENT. THIS TYPE OF ARCHITECTURE HAS ONE OR MORE CLIENT COMPUTERS CONNECTED TO A CENTRAL SERVER OVER A NETWORK OR INTERNET CONNECTION. • PEER-TO-PEER ARCHITECTURE (P2P ARCHITECTURE) IS A COMMONLY USED COMPUTER NETWORKING ARCHITECTURE IN WHICH EACH WORKSTATION, OR NODE, HAS THE SAME CAPABILITIES AND RESPONSIBILITIES. IT IS OFTEN COMPARED AND CONTRASTED TO THE CLASSIC CLIENT/SERVER ARCHITECTURE, IN WHICH SOME COMPUTERS ARE DEDICATED TO SERVING OTHERS
  • 21. • 3)MULTI DBMS ARCHITECTURE: THIS IS AN INTEGRATED DATABASE SYSTEM FORMED BY A COLLECTION OF TWO OR MORE AUTONOMOUS DATABASE SYSTEMS. MULTI-DBMS CAN BE EXPRESSED THROUGH SIX LEVELS OF SCHEMAS − MULTI-DATABASE VIEW LEVEL − DEPICTS MULTIPLE USER VIEWS COMPRISING OF SUBSETS OF THE INTEGRATED DISTRIBUTED DATABASE
  • 22. • TWO TYPES OF SERVER CLIENT ARCHETECTURE: • 1)SINGLE SERVER MULTIPLE CLIENT: A MULTIPLE CLIENT SERVER IS A TYPE OF SOFTWARE ARCHITECTURE FOR COMPUTER NETWORKS WHERE CLIENTS, WHICH CAN BE BASIC WORKSTATIONS OR FULLY FUNCTIONAL PERSONAL COMPUTERS, REQUEST INFORMATION FROM A SERVER COMPUTER. ... ONE SERVER IS ABLE TO HANDLE DOZENS OF INFORMATION REQUESTS FROM CLIENT COMPUTERS SIMULTANEOUSLY.
  • 23. • 2)MULTIPLE SERVER MULTIPLE CLIENT:A MULTIPLE CLIENT SERVER IS A TYPE OF SOFTWARE ARCHITECTURE FOR COMPUTER NETWORKS WHERE CLIENTS REQUEST INFORMATION FROM A SERVER COMPUTER. THE MOST COMMON TYPE OF MULTIPLE CLIENT SERVER SYSTEM FOR SMALL BUSINESSES AND HOMES IS THE SINGLE SERVER WITH MULTIPLE CLIENTS.
  • 24. • PEER- TO-PEER ARCHITECTURE FOR DDBMS • IN THESE SYSTEMS, EACH PEER ACTS BOTH AS A CLIENT AND A SERVER FOR IMPARTING DATABASE SERVICES. THE PEERS SHARE THEIR RESOURCE WITH OTHER PEERS AND CO- ORDINATE THEIR ACTIVITIES. • THIS ARCHITECTURE GENERALLY HAS FOUR LEVELS OF SCHEMAS − • GLOBAL CONCEPTUAL SCHEMA − DEPICTS THE GLOBAL LOGICAL VIEW OF DATA. • LOCAL CONCEPTUAL SCHEMA − DEPICTS LOGICAL DATA ORGANIZATION AT EACH SITE. • LOCAL INTERNAL SCHEMA − DEPICTS PHYSICAL DATA ORGANIZATION AT EACH SITE. • EXTERNAL SCHEMA − DEPICTS USER VIEW OF DATA.
  • 25.
  • 26. • MULTI - DBMS ARCHITECTURES • THIS IS AN INTEGRATED DATABASE SYSTEM FORMED BY A COLLECTION OF TWO OR MORE AUTONOMOUS DATABASE SYSTEMS. • MULTI-DBMS CAN BE EXPRESSED THROUGH SIX LEVELS OF SCHEMAS − • MULTI-DATABASE VIEW LEVEL − DEPICTS MULTIPLE USER VIEWS COMPRISING OF SUBSETS OF THE INTEGRATED DISTRIBUTED DATABASE. • MULTI-DATABASE CONCEPTUAL LEVEL − DEPICTS INTEGRATED MULTI-DATABASE THAT COMPRISES OF GLOBAL LOGICAL MULTI-DATABASE STRUCTURE DEFINITIONS. • MULTI-DATABASE INTERNAL LEVEL − DEPICTS THE DATA DISTRIBUTION ACROSS DIFFERENT SITES AND MULTI- DATABASE TO LOCAL DATA MAPPING. • LOCAL DATABASE VIEW LEVEL − DEPICTS PUBLIC VIEW OF LOCAL DATA.
  • 27. • LOCAL DATABASE CONCEPTUAL LEVEL − DEPICTS LOCAL DATA ORGANIZATION AT EACH SITE. • LOCAL DATABASE INTERNAL LEVEL − DEPICTS PHYSICAL DATA ORGANIZATION AT EACH SITE. • THERE ARE TWO DESIGN ALTERNATIVES FOR MULTI-DBMS − • MODEL WITH MULTI-DATABASE CONCEPTUAL LEVEL. • MODEL WITHOUT MULTI-DATABASE CONCEPTUAL LEVEL.
  • 28.
  • 29.
  • 30. • WHAT IS DATA DISTRIBUTION STRATEGY: • DISTRIBUTION STRATEGY IS THAT BY ALLOCATING DIFFERENT. RESOURCES, E.G. NUMBER OF DATABASE NODES, TO DIFFERENT. CLASSES OF USERS, WE CAN ROUTE THE DATABASE REQUESTS TO. DIFFERENT RESOURCES.
  • 31. • DATA REPLICATION • DATA REPLICATION IS THE PROCESS OF STORING SEPARATE COPIES OF THE DATABASE AT TWO OR MORE SITES. IT IS A POPULAR FAULT TOLERANCE TECHNIQUE OF DISTRIBUTED DATABASES.
  • 32. • ADVANTAGES OF DATA REPLICATION • RELIABILITY − IN CASE OF FAILURE OF ANY SITE, THE DATABASE SYSTEM CONTINUES TO WORK SINCE A COPY IS AVAILABLE AT ANOTHER SITE(S). • REDUCTION IN NETWORK LOAD − SINCE LOCAL COPIES OF DATA ARE AVAILABLE, QUERY PROCESSING CAN BE DONE WITH REDUCED NETWORK USAGE, PARTICULARLY DURING PRIME HOURS. DATA UPDATING CAN BE DONE AT NON- PRIME HOURS.
  • 33. • QUICKER RESPONSE − AVAILABILITY OF LOCAL COPIES OF DATA ENSURES QUICK QUERY PROCESSING AND CONSEQUENTLY QUICK RESPONSE TIME. • SIMPLER TRANSACTIONS − TRANSACTIONS REQUIRE LESS NUMBER OF JOINS OF TABLES LOCATED AT DIFFERENT SITES AND MINIMAL COORDINATION ACROSS THE NETWORK. THUS, THEY BECOME SIMPLER IN NATURE.
  • 34. • DISADVANTAGES OF DATA REPLICATION • INCREASED STORAGE REQUIREMENTS − MAINTAINING MULTIPLE COPIES OF DATA IS ASSOCIATED WITH INCREASED STORAGE COSTS. THE STORAGE SPACE REQUIRED IS IN MULTIPLES OF THE STORAGE REQUIRED FOR A CENTRALIZED SYSTEM. • INCREASED COST AND COMPLEXITY OF DATA UPDATING − EACH TIME A DATA ITEM IS UPDATED, THE UPDATE NEEDS TO BE REFLECTED IN ALL THE COPIES OF THE DATA AT THE DIFFERENT SITES. THIS REQUIRES COMPLEX SYNCHRONIZATION TECHNIQUES AND PROTOCOLS.
  • 35. • UNDESIRABLE APPLICATION – DATABASE COUPLING : • IF COMPLEX UPDATE MECHANISMS ARE NOT USED, REMOVING DATA INCONSISTENCY REQUIRES COMPLEX CO-ORDINATION AT APPLICATION LEVEL. THIS RESULTS IN UNDESIRABLE APPLICATION – DATABASE COUPLING.
  • 36. • UPDATING DISTRIBUTED DATA • SYNCHRONOUS REPLICATION CONTROL • IN SYNCHRONOUS REPLICATION APPROACH, THE DATABASE IS SYNCHRONIZED SO THAT ALL THE REPLICATIONS ALWAYS HAVE THE SAME VALUE. A TRANSACTION REQUESTING A DATA ITEM WILL HAVE ACCESS TO THE SAME VALUE IN ALL THE SITES. •
  • 37. • ASYNCHRONOUS REPLICATION CONTROL • IN ASYNCHRONOUS REPLICATION APPROACH, THE REPLICAS DO NOT ALWAYS MAINTAIN THE SAME VALUE. ONE OR MORE REPLICAS MAY STORE AN OUTDATED VALUE, AND A TRANSACTION CAN SEE THE DIFFERENT VALUES. THE PROCESS OF BRINGING ALL THE REPLICAS TO THE CURRENT VALUE IS CALLED SYNCHRONIZATION.
  • 38. • FRAGMENTATION. FRAGMENTATION IS THE TASK OF DIVIDING A TABLE INTO A SET OF SMALLER TABLES. THE SUBSETS OF THE TABLE ARE CALLED FRAGMENTS. FRAGMENTATION CAN BE OF THREE TYPES: HORIZONTAL, VERTICAL, AND HYBRID (COMBINATION OF HORIZONTAL AND VERTICAL).
  • 39. • WHAT ARE THE ADVANTAGES OF USING FRAGMENTATION: • THE MAIN ADVANTAGE OF FRAGMENTATION IS TO IMPROVE THE PERFORMANCE OF DISTRIBUTED DATABASE DESIGN BY INCREASING THE EFFICIENCY SINCE DATA IS STORED ONLY WHERE IT IS NEEDED. FRAGMENTS CAN BE ALLOCATED AT DIFFERENT NETWORK SITES IN A PROCESS CALLED DATA ALLOCATION.
  • 40. • WHY WE USED FRAGMENTATION: • FRAGMENTATION IS A DATABASE SERVER FEATURE THAT ALLOWS YOU TO CONTROL WHERE DATA IS STORED AT THE TABLE LEVEL. FRAGMENTATION ENABLES YOU TO DEFINE GROUPS OF ROWS OR INDEX KEYS WITHIN A TABLE ACCORDING TO SOME ALGORITHM OR SCHEME . ... YOU CAN USE THIS TABLE TO ACCESS INFORMATION ABOUT YOUR FRAGMENTED TABLES AND INDEXES.
  • 41. • 1)HORIZONTAL FRAGMENTATION: HF INVOLVES TAKING ROWS (RECORDS) FROM A TABLE AND PLACING DIFFERENT ROWS AT DIFFERENT NODES (LOCATIONS). FOR EXAMPLE, THE CUSTOMER TABLE MAY BE FRAGMENTED SUCH THAT THE CUSTOMERS FOR A GIVEN OFFICE ARE STORED AT THAT OFFICE.
  • 42. • CORRECTNESS RULES OF FRAGMENTATION • 1)COMPLETENESS: TO ENSURE THAT THERE IS NO LOSS OF DATA DUE TO FRAGMENTATION. COMPLETENESS PROPERTY ENSURES THIS BY CHECKING WHETHER ALL THE RECORDS WHICH WERE PART OF A TABLE (BEFORE FRAGMENTATION) ARE FOUND IN AT LEAST ONE OF THE FRAGMENTS AFTER FRAGMENTATION.
  • 43. • 2)RECONSTRACTION: THIS RULE ENSURES THE ABILITY TO RE-CONSTRUCT THE ORIGINAL TABLE FROM THE FRAGMENTS THAT ARE CREATED. THIS RULE IS TO CHECK WHETHER THE FUNCTIONAL DEPENDENCIES ARE PRESERVED OR NOT. • 2)DISJOINT: THIS RULE ENSURES THAT NO RECORD WILL BECOME A PART OF TWO OR MORE DIFFERENT FRAGMENTS DURING THE FRAGMENTATION PROCESS. IF A TABLE R IS PARTITIONED INTO FRAGMENTS R1, R2, …, RN, THEN DISJOINTNESS INSISTS THE FOLLOWING; • R1 ∩ R2 ∩ … ∩ RN = NULL SET
  • 44. • TYPES OF HORIZENTAL FREQMENTATION: • 1)PRIMARY HF. • 2)DERIVED HF. • EXAMPLE OF HORIZONTAL FRAGMENTATION OF DATA FOR DISTRIBUTED DATABASE
  • 45. • PRIMARY HORIZONTAL FRAGMENTATION (PHF) • PRIMARY HORIZONTAL FRAGMENTATION IS A TABLE FRAGMENTATION TECHNIQUE IN WHICH WE FRAGMENT A SINGLE TABLE AND THIS FRAGMENTATION IS ROW-WISE AND USING A SET OF SIMPLE CONDITIONS • NOTE: CONDITIONS ARE ALSO CALLED PREDICATES. • SIMPLE PREDICATE
  • 46. • GIVEN A TABLE/RELATION R WITH SET OF ATTRIBUTES [A1, A2, A3, A4, …, AN], A SIMPLE PREDICATE PI CAN BE EXPRESSED AS FOLLOWS; • PI : AJ Θ VALUE • WHERE Θ CAN BE ANY OF THE SYMBOLS IN THE SET {≤, ≥, ≠, <, >, =}, A VALUE CAN BE ANY VALUE STORED IN THE TABLE FOR THE ATTRIBUTED A I. FOR EXAMPLE, CONSIDER THE FOLLOWING TABLE STUDENT GIVEN IN FIGURE 1;
  • 47. RollNo Marks University T01 33 Harvard T03 77 Stanford T04 23 California T02 89 California T05 90 Harvard T06 90 Harvard T07 15 Stanford
  • 48. • FIGURE 1: STUDENT TABLE • FOR THE ABOVE TABLE, WE COULD DEFINE ANY SIMPLE PREDICATES LIKE UNIVERSITY = ‘CALIFORNIA’, UNIVERSITY= ‘HARVARD’, MARKS < 77 ETC USING THE ABOVE EXPRESSION “AJ Θ VALUE”. • SET OF SIMPLE PREDICATES • SET OF SIMPLE PREDICATES IS SET OF ALL CONDITIONS COLLECTIVELY REQUIRED TO FRAGMENT A RELATION INTO SUBSETS. FOR A TABLE R, SET OF SIMPLE PREDICATE CAN BE DEFINED AS;
  • 49. • PREDICATE P = { P1, P2, …, PN} • EXAMPLE 1 • AS AN EXAMPLE, FOR THE ABOVE TABLE STUDENT, IF SIMPLE CONDITIONS ARE, MARKS < 77, MARKS ≥ 77, THEN, • SET OF SIMPLE PREDICATES P1 = {MARKS < 77, MARKS ≥ 77}
  • 50. • MIN-TERM PREDICATE • WHEN WE FRAGMENT ANY RELATION HORIZONTALLY, WE USE SINGLE CONDITION OR SET OF SIMPLE PREDICATES TO FILTER THE DATA. GIVEN A RELATION R AND SET OF SIMPLE PREDICATES, WE CAN FRAGMENT A RELATION HORIZONTALLY AS FOLLOWS; • FRAGMENT, RI = ΣFI(R), 1 ≤ I ≤ N • WHERE FI IS THE SET OF SIMPLE PREDICATES, ALSO CALLED AS A MIN-TERM PREDICATE WHICH CAN BE WRITTEN AS FOLLOWS;
  • 51. • MIN-TERM PREDICATE, MI=P1 Λ P2 Λ P3 Λ … Λ PN • HERE, P1 MEANS BOTH P1 OR ¬(P1), P2 MEANS BOTH P2 OR ¬(P2), P3 MEANS BOTH P3 OR ¬(P3), AND SO ON. USING THE CONJUNCTIVE FORM OF VARIOUS SIMPLE PREDICATES IN DIFFERENT COMBINATION, WE CAN DERIVE MANY SUCH MIN-TERM PREDICATES. • FOR THE EXAMPLE 1 STATED PREVIOUSLY, WE CAN DERIVE SET OF MIN-TERM PREDICATES USING THE RULES STATED ABOVE AS FOLLOWS; • WE WILL GET 2N MIN-TERM PREDICATES, WHERE N IS THE NUMBER OF SIMPLE PREDICATES IN THE GIVEN PREDICATE SET. FOR P1, WE HAVE 2 SIMPLE PREDICATES. HENCE, WE WILL GET 4 (22) POSSIBLE COMBINATIONS OF MIN-TERM PREDICATES AS FOLLOWS; • M1 = {MARKS < 77 Λ MARKS ≥ 77}
  • 52. • M2 = {MARKS < 77 Λ ¬(MARKS ≥ 77)} • M3 = {¬(MARKS < 77) Λ MARKS ≥ 77} • M4 = {¬(MARKS < 77) Λ ¬(MARKS ≥ 77)} • OUR NEXT STEP IS TO CHOOSE THE MIN-TERM PREDICATES WHICH CAN SATISFY CERTAIN CONDITIONS TO FRAGMENT A TABLE AND ELIMINATE THE OTHERS WHICH ARE NOT USEFUL. FOR EXAMPLE, THE ABOVE SET OF MIN-TERM PREDICATES CAN BE APPLIED EACH AS A FORMULA FI STATED IN THE ABOVE RULE FOR FRAGMENT RI AS FOLLOWS; • STUDENT1 = ΣMARKS< 77 Λ MARKS ≥ 77(STUDENT) • WHICH CAN BE WRITTEN IN EQUIVALENT SQL QUERY AS, • STUDENT1 • SELECT * FROM STUDENT WHERE MARKS < 77 AND MARKS ≥ 77;
  • 53. • STUDENT2 = ΣMARKS< 77 Λ ¬(MARKS ≥ 77)(STUDENT) • WHICH CAN BE WRITTEN IN EQUIVALENT SQL QUERY AS, • STUDENT2 • SELECT * FROM STUDENT WHERE MARKS < 77 AND NOT MARKS ≥ 77; WHERE NOT MARKS ≥ 77 IS EQUIVALENT TO MARKS < 77.
  • 54. • DERIVED HORIZENTAL FREGMENTATION:THE PROCESS OF CREATING HORIZONTAL FRAGMENTS OF A TABLE IN QUESTION BASED ON THE ALREADY CREATED HORIZONTAL FRAGMENTS OF ANOTHER RELATION (FOR EXAMPLE, BASE TABLE) IS CALLED DERIVED HORIZONTAL FRAGMENTATION. ... FOR EXAMPLE, CONSIDER A RELATION WHICH IS CONNECTED WITH ANOTHER RELATION USING FOREIGN KEY CONCEPT
  • 55. • CONSIDER AN EXAMPLE, WHERE AN ORGANIZATION MAINTAINS THE INFORMATION ABOUT ITS CUSTOMERS.THEY STORE INFORMATION ABOUT THE CUSTOMER IN CUSTOMER TABLE AND THE CUSTOMER ADDRESSES IN C_ADDRESS TABLE AS FOLLOWS; CUSTOMER(CID, CNAME, PROD_PURCHASED, SHOP_LOCATION) • C_ADDRESS(CID, C_ADDRESS) • THE TABLE CUSTOMER STORES INFORMATION ABOUT THE CUSTOMER, THE PRODUCT PURCHASED FROM THEIR SHOP, AND THE SHOP LOCATION WHERE THE PRODUCT IS PURCHASED. C_ADDRESS STORES INFORMATION ABOUT PERMANENT AND PRESENT ADDRESSES OF THE CUSTOMER. HERE, CUSTOMER IS THE OWNER RELATION AND C_ADDRESS IS THE MEMBER RELATION.
  • 56. CID CNAME PROD_PURCHASED SHOP_LOCATION C001 Ram Air Conditioner Mumbai C002 Guru Television Chennai C010 Murugan Television Coimbatore C003 Yuvraj DVD Player Pune C004 Gopinath Washing machine Coimbatore
  • 57. CID C_ADDRESS C001 Bandra, Mumbai C001 XYZ, Pune C002 T.Nagar, Chennai C002 Kovil street, Madurai C003 ABX, Pune C004 Gandhipuram, Ooty C004 North street, Erode C010 Peelamedu, Coimbatore
  • 58. • IF THE ORGANIZATION WOULD GO FOR FRAGMENTING THE RELATION CUSTOMER ON THE SHOP_LOCATION ATTRIBUTE, IT NEEDS TO CREATE 4 FRAGMENTS USING HORIZONTAL FRAGMENTATION TECHNIQUE AS GIVEN IN FIGURE 3 BELOW. CID CNAME PROD_PURCHASED SHOP_LOCATION C001 Ram Air Conditioner Mumbai CUSTOMER1
  • 59. CID CNAME PROD_PURCHASED SHOP_LOCATION C002 Guru Television Chennai CID CNAME PROD_PURCHASED SHOP_LOCATION C010 Murugan Television Coimbatore C004 Gopinath Washing machine Coimbatore CUSTOMER3
  • 60. CID CNAME PROD_PURCHASED SHOP_LOCATION C003 Yuvraj DVD Player Pune 4
  • 61. • NOW, IT IS NECESSARY TO FRAGMENT THE SECOND RELATION C_ADDRESS BASED ON THE FRAGMENT CREATED ON CUSTOMER RELATION. BECAUSE, IN ANY OTHER WAY, IF WE FRAGMENT THE RELATION C_ADDRESS, THEN IT MAY END IN DIFFERENT LOCATION FOR DIFFERENT DATA. FOR EXAMPLE, IF C_ADDRESS IS FRAGMENTED ON THE LAST DIGIT OF THE CID ATTRIBUTE, IT WILL END UP WITH MORE NUMBER OF FRAGMENTS AND THE DATA MAY NOT BE STORED IN THE SAME LOCATION WHERE CUSTOMER INFORMATION ARE STORED. THAT IS, CUSTOMER ‘RAM’ INFORMATION IS STORED IN MUMBAI AND HIS ADDRESS INFORMATION MIGHT BE STORED SOMEWHERE ELSE. TO AVOID SUCH CONFUSION, THE TABLE C_ADDRESS WHICH IS ACTUALLY A MEMBER TABLE OF CUSTOMER, MUST BE FRAGMENTED INTO FOUR FRAGMENTS AND BASED ON THE CUSTOMER TABLE FRAGMENTS GIVEN IN FIGURE 3. THIS TYPE OF FRAGMENTATION BASED ON OWNER RELATION IS CALLED DERIVED HORIZONTAL FRAGMENTATION. THIS WILL WORK FOR RELATIONS WHERE AN EQUI-JOIN IS REQUIRED FOR JOINING TWO RELATIONS. BECAUSE, AN EQUI-JOIN CAN BE REPRESENTED AS SET OF SEMI- JOINS.
  • 62. • THE FRAGMENTATION OF C_ADDRESS IS DONE AS FOLLOW AS SET OF SEMI- JOINS AS FOLLOWS. • C_ADDRESS1 = C_ADDRESS ⋉ CUSTOMER1 • C_ADDRESS2 = C_ADDRESS ⋉ CUSTOMER2 • C_ADDRESS3 = C_ADDRESS ⋉ CUSTOMER3 • C_ADDRESS4 = C_ADDRESS ⋉ CUSTOMER4
  • 63. • THIS WILL RESULT IN FOUR FRAGMENTS OF C_ADDRESS WHERE THE CUSTOMER ADDRESS OF ALL CUSTOMERS OF FRAGMENT CUSTOMER1 WILL GO INTO C_ADDRESS1, AND THE CUSTOMER ADDRESS OF ALL CUSTOMERS OF FRAGMENT CUSTOMER2 WILL GO INTO C_ADDRESS2, AND SO ON. THE RESULTANT FRAGMENT OF C_ADDRESS WILL BE THE FOLLOWING. • • FIGURE 4: DERIVED HORIZONTAL FRAGMENTS OF FIGURE 2 AS A MEMBER RELATION OF THE OWNER RELATION’S FRAGMENTS FROM FIGURE 3
  • 64. CID C_ADDRESS C001 Bandra, Mumbai C001 XYZ, Pune CID C_ADDRESS C002 T.Nagar, Chennai C002 Kovil street, Madurai C_ADDRESS2 CID C_ADDRESS C004 Gandhipuram, Ooty C004 North street, Erode C010 Peelamedu, Coimbatore C_ADDRESS3
  • 66. • CHECKING FOR CORRECTNESS • COMPLETENESS: THE COMPLETENESS OF A DERIVED HORIZONTAL FRAGMENTATION IS MORE DIFFICULT THAN PRIMARY HORIZONTAL FRAGMENTATION. BECAUSE, THE PREDICATES USED ARE DETERMINING THE FRAGMENTATION OF TWO RELATIONS. FORMALLY, FOR FRAGMENTATION OF TWO RELATIONS R AND S, SUCH AS {R1, R2, …, R3} AND {S1, S2, …, S3}, THERE SHOULD BE ONE COMMON ATTRIBUTE SUCH AS A. THEN, FOR EACH TUPLE T OF RI, THERE SHOULD BE A TUPLE SI WHICH HAVE A COMMON VALUE FOR A. THIS IS KNOWN AS REFERENTIAL INTEGRITY. • THE DERIVED FRAGMENTATION OF C_ADDRESS IS COMPLETE. BECAUSE, THE VALUE OF THE COMMON ATTRIBUTES CID FOR THE FRAGMENTS CUSTOMERI AND C_ADDRESSI ARE THE SAME. FOR EXAMPLE, THE VALUE PRESENT IN CID OF CUSTOMER1 IS ALSO AND ONLY PRESENT IN C_ADDRESS1, ETC.
  • 67. • RECONSTRUCTION: RECONSTRUCTION OF A RELATION FROM ITS FRAGMENTS IS PERFORMED BY THE UNION OPERATOR IN BOTH THE PRIMARY AND THE DERIVED HORIZONTAL FRAGMENTATION • RECONSTRUCTION: RECONSTRUCTION OF A RELATION FROM ITS FRAGMENTS IS PERFORMED BY THE UNION OPERATOR IN BOTH THE PRIMARY AND THE DERIVED HORIZONTAL FRAGMENTATION..
  • 68. • 2)VERTICAL FREQMENTATION:VERTICAL FRAGMENTATION REFERS TO THE PROCESS OF DECOMPOSING A TABLE VERTICALLY BY ATTRIBUTES ARE COLUMNS. IN THIS FRAGMENTATION, SOME OF THE ATTRIBUTES ARE STORED IN ONE SYSTEM AND THE REST ARE STORED IN OTHER SYSTEMS. THIS IS BECAUSE EACH SITE MAY NOT NEED ALL COLUMNS OF A TABLE. • =>EACH SITE MAY NOT ALL THE ATTRIBUTE OF RELATION. • =>IT IS SUBSET OF A RELATION WHICH IS CREATED BY A SUBSET OF COLUMN.
  • 69. • =>A VF OF A RELATION PRODUCE FREGMENTS R1,R2,R3….RN, EACH OF WHICH CONTAIN SUBSET OF ATTRIBUTE OF R AND PRIMARY KEY OF R. • =>RECONSTRACTION OF VF IS JOIN OPERATOR.
  • 70. • =>IN ORDER TO TAKE CARE OF RESTORATION, EACH FRAGMENT MUST CONTAIN THE PRIMARY KEY FIELD(S) IN A TABLE. THE FRAGMENTATION SHOULD BE IN SUCH A MANNER THAT WE CAN REBUILD A TABLE FROM THE FRAGMENT BY TAKING THE NATURAL JOIN OPERATION AND TO MAKE IT POSSIBLE WE NEED TO INCLUDE A SPECIAL ATTRIBUTE CALLED TUPLE-ID TO THE SCHEMA. FOR THIS PURPOSE, A USER CAN USE ANY SUPER KEY. AND BY THIS, THE TUPLES OR ROWS CAN BE LINKED TOGETHER. THE PROJECTION IS AS FOLLOWS:
  • 71. • FOR EXAMPLE, FOR THE EMPLOYEE TABLE WE HAVE T1 AS : • ENO ENAME DESIGNTUPLE_ID • 101 A ABC 1 • 102 B ABC 2 • 103 C ABC 3 • 104 D ABC 1 • 105 E ABC 4
  • 72. • ΠA1, A2,…, AN (T) • WHERE, Π IS RELATIONAL ALGEBRA OPERATOR • A1…., AN ARE THE AATRIUBUTES OF T • T IS THE TABLE (RELATION)
  • 73. • FOR THE SECOND. SUB TABLE OF RELATION AFTER VERTICAL FRAGMENTATION IS GIVEN AS FOLLOWS : • SALARY DEP TUPLE_ID • 3000 1 1 • 4000 2 2 • 5500 3 3 • 5000 1 4 • 2000 4 5
  • 74. • THIS IS T2 AND TO GET BACK TO THE ORIGINAL T, WE JOIN THESE TWO FRAGMENTS T1 AND T2 AS ΠEMPLOYEE (T1 ⋈ T2) • 3. MIXED FRAGMENTATION – THE COMBINATION OF VERTICAL FRAGMENTATION OF A TABLE FOLLOWED BY FURTHER HORIZONTAL FRAGMENTATION OF SOME FRAGMENTS IS CALLED MIXED OR HYBRID FRAGMENTATION. FOR DEFINING THIS TYPE OF FRAGMENTATION WE USE THE SELECT AND THE PROJECT OPERATIONS OF RELATIONAL ALGEBRA. IN SOME SITUATIONS, THE HORIZONTAL AND THE VERTICAL FRAGMENTATION ISN’T ENOUGH TO DISTRIBUTE DATA FOR SOME APPLICATIONS AND IN THAT CONDITIONS, WE NEED A FRAGMENTATION CALLED A MIXED FRAGMENTATION.
  • 75. • MIXED FRAGMENTATION CAN BE DONE IN TWO DIFFERENT WAYS: • THE FIRST METHOD IS TO FIRST CREATE A SET OR GROUP OF HORIZONTAL FRAGMENTS AND THEN CREATE VERTICAL FRAGMENTS FROM ONE OR MORE OF THE HORIZONTAL FRAGMENTS. • THE SECOND METHOD IS TO FIRST CREATE A SET OR GROUP OF VERTICAL FRAGMENTS AND THEN CREATE HORIZONTAL FRAGMENTS FROM ONE OR MORE OF THE VERTICAL FRAGMENTS. THE ORIGINAL RELATION CAN BE OBTAINED BY THE COMBINATION OF JOIN AND UNION OPERATIONS WHICH IS GIVEN AS FOLLOWS:
  • 76. • ΣP(ΠA1, A2..,AN(T)) • ΠA1,A2….,AN (ΣP(T)) • FOR EXAMPLE, FOR OUR EMPLOYEE TABLE, BELOW IS THE IMPLEMENTATION OF MIXED FRAGMENTATION IS ΠENAME, DESIGN (ΣENO > 102(EMPLOYEE)) • THE RESULT OF THIS FRAGMENTATION IS:
  • 77. • ENAME DESIGN • A ABC • B ABC • C ABC
  • 78. • HYBRID FRAGMENTATION: • IN HYBRID FRAGMENTATION, A COMBINATION OF HORIZONTAL AND VERTICAL FRAGMENTATION TECHNIQUES ARE USED. THIS IS THE MOST FLEXIBLE FRAGMENTATION TECHNIQUE SINCE IT GENERATES FRAGMENTS WITH MINIMAL EXTRANEOUS INFORMATION. HOWEVER, RECONSTRUCTION OF THE ORIGINAL TABLE IS OFTEN AN EXPENSIVE TASK. •
  • 79. • HYBRID FRAGMENTATION CAN BE DONE IN TWO ALTERNATIVE WAYS − • AT FIRST, GENERATE A SET OF HORIZONTAL FRAGMENTS; THEN GENERATE VERTICAL FRAGMENTS FROM ONE OR MORE OF THE HORIZONTAL FRAGMENTS. • AT FIRST, GENERATE A SET OF VERTICAL FRAGMENTS; THEN GENERATE HORIZONTAL FRAGMENTS FROM ONE OR MORE OF THE VERTICAL FRAGMENTS
  • 80. • WHAT IS A TRANSACTION IN DISTRIBUTED DBMS: • A PROGRAM THAT INCLUDES A COLLECTION OF DATABASE OPERATIONS WHICH ARE EXECUTED AS A LOGICAL UNIT OF PROCESSING THE DATA IS KNOWN AS A TRANSACTION. IN A TRANSACTION ONE OR MORE OF THE DATA OPERATIONS ARE PERFORMED SUCH AS INSERT, UPDATE, DELETE OR RETRIEVE.
  • 81. • TYPES : • 1)LOCAL TRANSACTION • 2)GLOBAL TRANSACTION • 1) SOME SOFTWARE PLATFORMS DO NOT PROVIDE TRANSACTION COORDINATION AS PART OF THE KERNEL OPERATING SYSTEM. WHEN, INSTEAD, EACH RESOURCE MANAGER INVOLVED IS SEPERATELY COORDINATING ITS OWN CHANGES, AND ONLY ITS CHANGES, THE TRANSACTION IS KNOWN AS A LOCAL TRANSACTION. ...
  • 82. • LOCAL TRANSACTION TO THE SITE IT HAS TWO TERM • A)COORDINATING SITE: • IT IS THE SITE WHERE THE TRANSACTION IS INITIATED. • B)PARTICIPATING SITE: • THESE ARE THE SITES WHERE SUBTRANSACTION ARE EXECUTED.
  • 83. • 2)GLOBAL TRANSACTION: • A GLOBAL TRANSACTION IS A MECHANISM THAT ALLOWS A SET OF PROGRAMMING TASKS, POTENTIALLY USING MORE THAN ONE RESOURCE MANAGER AND POTENTIALLY EXECUTING ON MULTIPLE SERVERS, TO BE TREATED AS ONE LOGICAL UNIT. ... A GLOBAL TRANSACTION MAY BE COMPOSED OF SEVERAL LOCAL TRANSACTIONS, EACH ACCESSING THE SAME RESOURCE MANAGER.
  • 84. • TRANSACTION MANAGER(TM): • EACH SITE HAS ITS OWN TM.IT MANAGES THE EXECUTION OF THOSE TRANSACTION OR SUBTRANSACTION THAT ACCESS DATA STORE IN THAT SITE. • TRANSACTION COORDINATOR: • IT IS PRESENT EACH SITE AND IS RESPONSIBLE FOR COORDINATING THE EXECUTION OF ALL TRANSACTION INITIATED AT THAT SITE
  • 85. • DISTRIBUTED TRANSACTION ARCHITECTURE: • DISTRIBUTED TRANSACTION: IS A SET OF OPERATIONS ON DATA THAT IS PERFORMED ACROSS TWO OR MORE DATA REPOSITORIES (ESPECIALLY DATABASES). IT IS TYPICALLY COORDINATED ACROSS SEPARATE NODES CONNECTED BY A NETWORK, BUT MAY ALSO SPAN MULTIPLE DATABASES ON A SINGLE SERVER.
  • 86.
  • 87. • ACID PROPERTIES: • A TRANSACTION IS A SINGLE LOGICAL UNIT OF WORK WHICH ACCESSES AND POSSIBLY MODIFIES THE CONTENTS OF A DATABASE. TRANSACTIONS ACCESS DATA USING READ AND WRITE OPERATIONS. IN ORDER TO MAINTAIN CONSISTENCY IN A DATABASE, BEFORE AND AFTER THE TRANSACTION, CERTAIN PROPERTIES ARE FOLLOWED. THESE ARE CALLED ACID PROPERTIES.
  • 88.
  • 89. • ATOMICITY BY THIS, WE MEAN THAT EITHER THE ENTIRE TRANSACTION TAKES PLACE AT ONCE OR DOESN’T HAPPEN AT ALL. THERE IS NO MIDWAY I.E. TRANSACTIONS DO NOT OCCUR PARTIALLY. EACH TRANSACTION IS CONSIDERED AS ONE UNIT AND EITHER RUNS TO COMPLETION OR IS NOT EXECUTED AT ALL. IT INVOLVES THE FOLLOWING TWO OPERATIONS. —ABORT: IF A TRANSACTION ABORTS, CHANGES MADE TO DATABASE ARE NOT VISIBLE. —COMMIT: IF A TRANSACTION COMMITS, CHANGES MADE ARE VISIBLE. ATOMICITY IS ALSO KNOWN AS THE ‘ALL OR NOTHING RULE’.
  • 90. • CONSIDER THE FOLLOWING TRANSACTION T CONSISTING OF T1 AND T2: TRANSFER OF 100 FROM ACCOUNT X TO ACCOUNT Y. •
  • 91. • IF THE TRANSACTION FAILS AFTER COMPLETION OF T1 BUT BEFORE COMPLETION OF T2.( SAY, AFTER WRITE(X) BUT BEFORE WRITE(Y)), THEN AMOUNT HAS BEEN DEDUCTED FROM X BUT NOT ADDED TO Y. THIS RESULTS IN AN INCONSISTENT DATABASE STATE. THEREFORE, THE TRANSACTION MUST BE EXECUTED IN ENTIRETY IN ORDER TO ENSURE CORRECTNESS OF DATABASE STATE.
  • 92. • CONSISTENCY THIS MEANS THAT INTEGRITY CONSTRAINTS MUST BE MAINTAINED SO THAT THE DATABASE IS CONSISTENT BEFORE AND AFTER THE TRANSACTION. IT REFERS TO THE CORRECTNESS OF A DATABASE. REFERRING TO THE EXAMPLE ABOVE, THE TOTAL AMOUNT BEFORE AND AFTER THE TRANSACTION MUST BE MAINTAINED. TOTAL BEFORE T OCCURS = 500 + 200 = 700. TOTAL AFTER T OCCURS = 400 + 300 = 700. THEREFORE, DATABASE IS CONSISTENT. INCONSISTENCY OCCURS IN CASE T1 COMPLETES BUT T2 FAILS. AS A RESULT T IS INCOMPLETE.
  • 93. • ISOLATION THIS PROPERTY ENSURES THAT MULTIPLE TRANSACTIONS CAN OCCUR CONCURRENTLY WITHOUT LEADING TO THE INCONSISTENCY OF DATABASE STATE. TRANSACTIONS OCCUR INDEPENDENTLY WITHOUT INTERFERENCE. CHANGES OCCURRING IN A PARTICULAR TRANSACTION WILL NOT BE VISIBLE TO ANY OTHER TRANSACTION UNTIL THAT PARTICULAR CHANGE IN THAT TRANSACTION IS WRITTEN TO MEMORY OR HAS BEEN COMMITTED. THIS PROPERTY ENSURES THAT THE EXECUTION OF TRANSACTIONS CONCURRENTLY WILL RESULT IN A STATE THAT IS EQUIVALENT TO A STATE ACHIEVED THESE WERE EXECUTED SERIALLY IN SOME ORDER.
  • 94. • LET X= 500, Y = 500. CONSIDER TWO TRANSACTIONS T AND T”.
  • 95. • SUPPOSE T HAS BEEN EXECUTED TILL READ (Y) AND THEN T’’ STARTS. AS A RESULT , INTERLEAVING OF OPERATIONS TAKES PLACE DUE TO WHICH T’’ READS CORRECT VALUE OF X BUT INCORRECT VALUE OF Y AND SUM COMPUTED BY T’’: (X+Y = 50, 000+500=50, 500) IS THUS NOT CONSISTENT WITH THE SUM AT END OF TRANSACTION: T: (X+Y = 50, 000 + 450 = 50, 450). THIS RESULTS IN DATABASE INCONSISTENCY, DUE TO A LOSS OF 50 UNITS. HENCE, TRANSACTIONS MUST TAKE PLACE IN ISOLATION AND CHANGES SHOULD BE VISIBLE ONLY AFTER THEY HAVE BEEN MADE TO THE MAIN MEMORY.
  • 96. • DURABILITY: THIS PROPERTY ENSURES THAT ONCE THE TRANSACTION HAS COMPLETED EXECUTION, THE UPDATES AND MODIFICATIONS TO THE DATABASE ARE STORED IN AND WRITTEN TO DISK AND THEY PERSIST EVEN IF A SYSTEM FAILURE OCCURS. THESE UPDATES NOW BECOME PERMANENT AND ARE STORED IN NON-VOLATILE MEMORY. THE EFFECTS OF THE TRANSACTION, THUS, ARE NEVER LOST.
  • 97. • THE ACID PROPERTIES, IN TOTALITY, PROVIDE A MECHANISM TO ENSURE CORRECTNESS AND CONSISTENCY OF A DATABASE IN A WAY SUCH THAT EACH TRANSACTION IS A GROUP OF OPERATIONS THAT ACTS A SINGLE UNIT, PRODUCES CONSISTENT RESULTS, ACTS IN ISOLATION FROM OTHER OPERATIONS AND UPDATES THAT IT MAKES ARE DURABLY STORED.
  • 98. • CONCURRENCY CONTROL: • IT IS THE PROCESS OF MANAGING SIMULTANEOUS EXECUTION OF TRANSACTION IN A SHARED DB. • CONCURRENCY CONTROL IS PROVIDED IN A DATABASE TO: • (I) ENFORCE ISOLATION AMONG TRANSACTIONS. • (II) PRESERVE DATABASE CONSISTENCY THROUGH CONSISTENCY PRESERVING EXECUTION OF TRANSACTIONS. • (III) RESOLVE READ-WRITE AND WRITE-READ CONFLICTS.
  • 99. • LOCK BASED PROTOCOLS – A LOCK IS A VARIABLE ASSOCIATED WITH A DATA ITEM THAT DESCRIBES A STATUS OF DATA ITEM WITH RESPECT TO POSSIBLE OPERATION THAT CAN BE APPLIED TO IT. THEY SYNCHRONIZE THE ACCESS BY CONCURRENT TRANSACTIONS TO THE DATABASE ITEMS. IT IS REQUIRED IN THIS PROTOCOL THAT ALL THE DATA ITEMS MUST BE ACCESSED IN A MUTUALLY EXCLUSIVE MANNER. OR A LOCK GUARANTEE EXCLUSIVE USE OF DATA ITEM TO A CURRENT TRANSACTION. • =>ACCESS THE DATA ITEM(LOCK ACQUIRE) • =>AFTER COMPLETION OF TRANSACTION(RELEASE)
  • 100. • 1) SHARED LOCK (S): ALSO KNOWN AS READ-ONLY LOCK. AS THE NAME SUGGESTS IT CAN BE SHARED BETWEEN TRANSACTIONS BECAUSE WHILE HOLDING THIS LOCK THE TRANSACTION DOES NOT HAVE THE PERMISSION TO UPDATE DATA ON THE DATA ITEM. S-LOCK IS REQUESTED USING LOCK-S INSTRUCTION. • 2) EXCLUSIVE LOCK (X): DATA ITEM CAN BE BOTH READ AS WELL AS WRITTEN.THIS IS EXCLUSIVE AND CANNOT BE HELD SIMULTANEOUSLY ON THE SAME DATA ITEM. X-LOCK IS REQUESTED USING LOCK-X INSTRUCTION.
  • 101. • LOCK COMPATIBILITY MATRIX –
  • 102. • CONVERSION OF LOCK: => HERE WE CAN SEE THAT ACQUIRING LOCKS ON DATA HAPPENS IN FIRST PHASE AND RELEASING LOCKS ON DATA HAPPENS IN SECOND PHASE. CONVERTING LOCKS FROM SHARED TO EXCLUSIVE (UPGRADING) CAN BE DONE IN FIRST PHASE ALONE WHILE CONVERTING FROM EXCLUSIVE TO SHARED (DOWNGRADING) IS DONE IN RELEASE PHASE
  • 103. • TWO PHASE COMMIT PROTOCOLS: • COMMIT REQUEST OR VOTING PHASE. • THE COORDINATOR SENDS A QUERY TO COMMIT MESSAGE TO ALL PARTICIPANTS AND WAITS UNTIL IT HAS RECEIVED A REPLY FROM ALL PARTICIPANTS. • THE PARTICIPANTS EXECUTE THE TRANSACTION UP TO THE POINT WHERE THEY WILL BE ASKED TO COMMIT. THEY EACH WRITE AN ENTRY TO THEIR UNDO LOG AND AN ENTRY TO THEIR .
  • 104. • EACH PARTICIPANT REPLIES WITH AN AGREEMENT MESSAGE (PARTICIPANT VOTES YES TO COMMIT), IF THE PARTICIPANT'S ACTIONS SUCCEEDED, OR AN ABORT MESSAGE (PARTICIPANT VOTES NO, NOT TO COMMIT), IF THE PARTICIPANT EXPERIENCES A FAILURE THAT WILL MAKE IT IMPOSSIBLE TO COMMIT.
  • 105. • COMMIT OR COMPLETION PHASE OR DECISION PHASE: • THE COORDINATOR SENDS A COMMIT MESSAGE TO ALL THE PARTICIPANTS. • EACH PARTICIPANT COMPLETES THE OPERATION, AND RELEASES ALL THE LOCKS AND RESOURCES HELD DURING THE TRANSACTION.
  • 106. • EACH PARTICIPANT SENDS AN ACKNOWLEDGEMENT TO THE COORDINATOR. • THE COORDINATOR COMPLETES THE TRANSACTION WHEN ALL ACKNOWLEDGMENTS HAVE BEEN RECEIVED. TWO RULES: 1) [READY,T] MESSAGE FROM ALL SITE.THE TRANSACTION WILL BE COMMIT. 2) IF AT LEAST ONE[NOT READY,T] MESSAGE FROM A SITE THE TRANSACTION WILL BE ABORT.
  • 107. • RECOVERY IN DDB • POSSIBLE ERROR: • 1)LOSS OF MESSAGE. • 2)FAILURE OF SITE AT WHICH TRANSACTION IS RUNNING. • 3)COMMUNICATION LINK DOWN.
  • 108. • FAILURE OF PARTICIPATING SITE: • 1)SITE FAIL BEFORE SEND [T,READY]. • 2)SITE FAIL AFTER SEND [T,READY].
  • 109. • HANDLING A FAILURE OF A PARTICIPATING SITE: • 1. THE RESPONSE OF THE TRANSACTION COORDINATOR OF TRANSACTION T. • IF THE FAILED SITE HAVE NOT SENT ANY <READY T> MESSAGE, THE TC CANNOT DECIDE TO COMMIT THE TRANSACTION [REMEMBER, IN DISTRIBUTED DATABASE ALL THE PARTICIPATING SITES MUST BE READY TO COMMIT. EVEN IF, ONE SITE IS NOT READY, THEN THE WHOLE TRANSACTION NEEDS TO BE ABORTED BY THE TC]. HENCE, THE TRANSACTION T SHOULD BE ABORTED AND OTHER PARTICIPATING SITES TO BE INFORMED.
  • 110. • 2. THE RESPONSE OF THE FAILED SITE WHEN IT RECOVERS. • WHEN RECOVER FROM FAILURE, THE RECOVERING SITE SI MUST IDENTIFY THE FATE OF THE TRANSACTIONS WHICH WERE GOING ON DURING THE FAILURE OF SI. THIS CAN BE DONE BY EXAMINING THE LOG FILE ENTRIES OF SITE SI.
  • 111. • FAILURE OF COORDINATING SITE: • 1) AFTER THE COORDINATOR WRITES THE PREPARE LOG RECORD AND BEFORE THE GATEWAY COMMIT PHASE. • 2) AFTER THE COORDINATOR SENDS A COMMIT MESSAGE TO THE GATEWAY BUT BEFORE IT RECEIVES A REPLY. • 3) AFTER GATEWAY COMMIT PHASE BUT BEFORE THE COORDINATOR WRITES A COMMIT RECORD TO THE LOGICAL LOG
  • 112. • HANDLING THE FAILURE OF A COORDINATOR SITE • LET US SUPPOSE THAT THE COORDINATOR SITE FAILED DURING EXECUTION OF 2 PHASE COMMIT (2PC) PROTOCOL FOR A TRANSACTION T. THIS SITUATION CAN BE HANDLED IN TWO WAYS; • THE OTHER SITES WHICH ARE PARTICIPATING IN THE TRANSACTION T MAY TRY TO DECIDE THE FATE OF THE TRANSACTION. THAT IS, THEY MAY TRY TO DECIDE ON COMMIT OR ABORT OF T USING THE CONTROL MESSAGES AVAILABLE IN EVERY SITE. • THE SECOND WAY IS TO WAIT UNTIL THE COORDINATOR SITE RECOVERS.
  • 113. • QUERY PROCESSING IN DISTRIBUTED DBMS • A QUERY PROCESSING IN A DISTRIBUTED DATABASE MANAGEMENT SYSTEM REQUIRES THE TRANSMISSION OF DATA BETWEEN THE COMPUTERS IN A NETWORK. A DISTRIBUTION STRATEGY FOR A QUERY IS THE ORDERING OF DATA TRANSMISSIONS AND LOCAL DATA PROCESSING IN A DATABASE SYSTEM. GENERALLY, A QUERY IN DISTRIBUTED DBMS REQUIRES DATA FROM MULTIPLE SITES, AND THIS NEED FOR DATA FROM DIFFERENT SITES IS CALLED THE TRANSMISSION OF DATA THAT CAUSES COMMUNICATION COSTS. QUERY PROCESSING IN DBMS IS DIFFERENT FROM QUERY PROCESSING IN CENTRALIZED DBMS DUE TO THIS COMMUNICATION COST OF DATA TRANSFER OVER THE NETWORK. THE TRANSMISSION COST IS LOW WHEN SITES ARE CONNECTED THROUGH HIGH-SPEED NETWORKS AND IS QUITE SIGNIFICANT IN OTHER NETWORKS.
  • 114. • 1. COSTS (TRANSFER OF DATA) OF DISTRIBUTED QUERY PROCESSING : • IN DISTRIBUTED QUERY PROCESSING, THE DATA TRANSFER COST OF DISTRIBUTED QUERY PROCESSING MEANS THE COST OF TRANSFERRING INTERMEDIATE FILES TO OTHER SITES FOR PROCESSING AND THEREFORE THE COST OF TRANSFERRING THE ULTIMATE RESULT FILES TO THE LOCATION WHERE THAT RESULT’S REQUIRED. LET’S SAY THAT A USER SENDS A QUERY TO SITE S1, WHICH REQUIRES DATA FROM ITS OWN AND ALSO FROM ANOTHER SITE S2. NOW, THERE ARE THREE STRATEGIES TO PROCESS THIS QUERY WHICH ARE GIVEN BELOW:
  • 115. • 1) WE CAN TRANSFER THE DATA FROM S2 TO S1 AND THEN PROCESS THE QUERY • 2) WE CAN TRANSFER THE DATA FROM S1 TO S2 AND THEN PROCESS THE QUERY • 3) WE CAN TRANSFER THE DATA FROM S1 AND S2 TO S3 AND THEN PROCESS THE QUERY. SO THE CHOICE DEPENDS ON VARIOUS FACTORS LIKE, THE SIZE OF RELATIONS AND THE RESULTS, THE COMMUNICATION COST BETWEEN DIFFERENT SITES, AND AT WHICH THE SITE RESULT WILL BE UTILIZED.
  • 116. COMMONLY, THE DATA TRANSFER COST IS CALCULATED IN TERMS OF THE SIZE OF THE MESSAGES. BY USING THE BELOW FORMULA, WE CAN CALCULATE THE DATA TRANSFER COST: DATA TRANSFER COST = C * SIZE WHERE C REFERS TO THE COST PER BYTE OF DATA TRANSFERRING AND SIZE IS THE NO. OF BYTES TRANSMITTED.
  • 117. • EXAMPLE: CONSIDER THE FOLLOWING TABLE EMPLOYEE AND DEPARTMENT • SITE1: EMPLOYEE • EID NAME SALARY DID • EID- 10 BYTES SALARY- 20 BYTES DID- 10 BYTES NAME- 20 BYTES TOTAL RECORDS- 1000 RECORD SIZE- 60 BYTES
  • 118. • SITE2: DEPARTMENT DID DNAME DID – 10 BYTES DNAME-20BYTES TOTAL RECORDS-50 RECORD SIZE -30 BYTES
  • 119. • EXAMPLE : FIND THE NAME OF EMPLOYEES AND THEIR DEPARTMENT NAMES. ALSO, FIND THE AMOUNT OF DATA TRANSFER TO EXECUTE THIS QUERY WHEN THE QUERY IS SUBMITTED TO SITE 3. • ANSWER : CONSIDERING THE QUERY IS SUBMITTED AT SITE 3 AND NEITHER OF THE TWO RELATIONS THAT IS AN EMPLOYEE AND THE DEPARTMENT NOT AVAILABLE AT SITE 3. SO, TO EXECUTE THIS QUERY, WE HAVE THREE STRATEGIES: • 1) TRANSFER BOTH THE TABLES THAT IS EMPLOYEE AND DEPARTMENT AT SITE 3 THEN JOIN THE TABLES THERE. THE TOTAL COST IN THIS IS 1000 * 60 + 50 * 30 = 60,000 + 1500 = 61500 BYTES. • 2) TRANSFER THE TABLE EMPLOYEE TO SITE 2, JOIN THE TABLE AT SITE 2 AND THEN TRANSFER THE RESULT AT SITE 3. THE TOTAL COST IN THIS IS 60 * 1000 + 60 * 1000 = 120000 BYTES SINCE WE HAVE TO TRANSFER 1000 TUPLES HAVING NAME AND DNAME FROM SITE 1,
  • 120. • 3) TRANSFER THE TABLE DEPARTMENT TO SITE 1, JOIN THE TABLE AT SITE 2 JOIN THE TABLE AT SITE1 AND THEN TRANSFER THE RESULT AT SITE3. THE TOTAL COST IS 30 * 50 + 60 * 1000 = 61500 BYTES SINCE WE HAVE TO TRANSFER 1000 TUPLES HAVING NAME AND DNAME FROM SITE 1 TO SITE 3 THAT IS 60 BYTES EACH. • NOW, IF THE OPTIMISATION CRITERIA ARE TO REDUCE THE AMOUNT OF DATA TRANSFER, WE CAN CHOOSE EITHER 1 OR 3 STRATEGIES FROM THE ABOVE. •
  • 121. • 2. USING SEMI JOIN IN DISTRIBUTED QUERY PROCESSING : • THE SEMI-JOIN OPERATION IS USED IN DISTRIBUTED QUERY PROCESSING TO REDUCE THE NUMBER OF TUPLES IN A TABLE BEFORE TRANSMITTING IT TO ANOTHER SITE. THIS REDUCTION IN THE NUMBER OF TUPLES REDUCES THE NUMBER AND THE TOTAL SIZE OF THE TRANSMISSION THAT ULTIMATELY REDUCING THE TOTAL COST OF DATA TRANSFER. LET’S SAY THAT WE HAVE TWO TABLES R1, R2 ON SITE S1, AND S2. NOW, WE WILL FORWARD THE JOINING COLUMN OF ONE TABLE SAY R1 TO THE SITE WHERE THE OTHER TABLE SAY R2 IS LOCATED. THIS COLUMN IS JOINED WITH R2 AT THAT SITE. THE DECISION WHETHER TO REDUCE R1 OR R2 CAN ONLY BE MADE AFTER COMPARING THE ADVANTAGES OF REDUCING R1 WITH THAT OF REDUCING R2. THUS, SEMI-JOIN IS A WELL-ORGANIZED SOLUTION TO REDUCE THE TRANSFER OF DATA IN DISTRIBUTED QUERY PROCESSING.
  • 122. • EXAMPLE : FIND THE AMOUNT OF DATA TRANSFERRED TO EXECUTE THE SAME QUERY GIVEN IN THE ABOVE EXAMPLE USING SEMI-JOIN OPERATION. • ANSWER : THE FOLLOWING STRATEGY CAN BE USED TO EXECUTE THE QUERY. • SELECT ALL (OR PROJECT) THE ATTRIBUTES OF THE EMPLOYEE TABLE AT SITE 1 AND THEN TRANSFER THEM TO SITE 3. FOR THIS, WE WILL TRANSFER NAME, DID(EMPLOYEE) AND THE SIZE IS 25 * 1000 = 25000 BYTES. • TRANSFER THE TABLE DEPARTMENT TO SITE 3 AND JOIN THE PROJECTED ATTRIBUTES OF EMPLOYEE WITH THIS TABLE. THE SIZE OF THE DEPARTMENT TABLE IS 25 * 50 = 1250 • APPLYING THE ABOVE SCHEME, THE AMOUNT OF DATA TRANSFERRED TO EXECUTE THE QUERY WILL BE 25000 + 1250 = 26250 BYTES.
  • 123. • DISTRIBUTED SECURITY MODEL • => IT HELPS IN SECURITY PROCESS AND CHANNEL FROM AUTHORIZED ACCESS. • 1) PROTECTING OBJECT: • SPECIFIES WHO IS ALLOWED TO PERFORM ACTION ON THE OBJECT. • 2) THREAD TO PROCESS: • IN BETWEEN CLIENT AND SERVER HACKER ACCESS SERVER.
  • 124. • 3)THREAD TO CHANNEL COMMUNICATION: • => MALICIOUS USER CAN COPY ALTER INJECT MESSAGE ON CHANNEL. • =>