8 drived horizontal fragmentation


Published on

Distribute Data Base

8 drived horizontal fragmentation

  1. 1. Distributed Database Systems 22-11-2012
  2. 2. You must remember!
  3. 3. You must also remember!• Relation data languages are based on relational algebra• Relational algebra consist of a set of operators on relations, which include: – Selection – Projection – Union – Cartesian product
  4. 4. Cartesian Product• The Cartesian product of two relations R of degree k1 and S of degree k2 is the set of (k1+k2)-tuples, where each result tuple is a concatenation of one tuple of R with one tuple of S, for all tuples of R and S (R X S)• Consider the relation EMP and PAY, EMPXPAY is:
  5. 5. Cartesian Product (EMPXPAY)
  6. 6. Joins• Join is a derivative of Cartesian Product• There are various forms of joins – Join • Inner join – Theta join – Equi-join • Outer join – Left join – Right join – Full join – Semi join
  7. 7. Theta Join• Consider the relation EMP, the theta-join of relation EMP and ASG over the join predicate EMP.ENO=ASG.ENO
  8. 8. Equi-Join• This example demonstrate a special case of theta-join called equi-join
  9. 9. Semi-Join• The semi-join of relation R, defined over the set of attributes A, by relation S, defined over the set of attributes B, is the subset of the tuples of R that participate in the join of R with S• The advantage of semi-join is that it decreases the number of tuples that need to be handled to form the join
  10. 10. Semi-Join• In centralized database systems, this is important because it usually results in a decreased number of secondary storage accesses by making better use of the memory.• It is even more important in distributed databases since it usually reduces the amount of data that needs to be transmitted between sites in order to evaluate a query.
  11. 11. Semi-Join• To demonstrate the difference between join and semi-join, lets consider the semi-join of EMP with PAY over the predicate EMP.TITLE = PAY.TITLE that is
  12. 12. Semi-Join
  13. 13. Derived Horizontal Fragmentation• A derived horizontal fragmentation is defined on a member relation of a link according to a selection operation specified on its owner• It is important to remember two points – First, the link between the owner and the member relations is defined as an equi-join – Second, an equi-join can be implemented by means of semi-join
  14. 14. Derived Horizontal Fragmentation• Accordingly, given a link L where owner(L) = S and member(L) = R, the derived horizontal fragments of R are defined as:• Where w is the maximum number of fragments that will be defined on R, and S where Fi is the formula according to which the primary horizontal fragment Si is defined
  15. 15. Derived Horizontal Fragmentation• To carry out a derived horizontal fragmentation, three inputs are needed: – The set of partitions of the owner relation (PAY1, PAY2) – The member relation – The set of semi join predicates between the owner and member (EMP.TITLE=PAY.TITLE)
  16. 16. Example
  17. 17. Example• Consider L1, where owner(L1) = PAY and member (L1) = EMP• We can group engineers into two groups according to their salary: those making less then or equal to $30,000, and those making more then $30,000• The two fragments EMP1 and EMP2 are defined as:
  18. 18. Example• The result of this fragmentation is depicted as:
  19. 19. Derived Horizontal Fragmentation• One potential complication that need attention• In a database schema if there are two link into a relation R, there could be more than one possible derived horizontal fragmentation of R• The choice of candidate fragmentation is based on two criteria – The fragmentation with better join characteristics – The fragmentation used in more applications
  20. 20. The fragmentation used in more Applications• It is quite straight forward if we take into consideration the frequency with which application access some data• The access of the heavy users can minimize the total impact on system performance
  21. 21. The Fragmentation with better join characteristics• Consider the last example, the effect of this fragmentation is that the join of the EMP and PAY relations to answer the query is assisted – By performing it on smaller relations – By potentially performing joins in parallel
  22. 22. The Fragmentation with better join characteristics• The first point is obvious, the fragments of EMP are smaller than EMP itself• Therefore, it will be faster to join any fragment of PAY with any fragment of EMP than to work with the relations themselves• The second point is however, more important and is at the heart of distributed databases• If, besides executing a number of queries at different sites, we can parallelize execution of one join query, the response time or throughput of the system can be expected to improve
  23. 23. The Fragmentation with better join characteristics• In the case of joins, this is possible under certain circumstances• Consider the join graph between the fragments of EMP and PAY, there is only one link coming in or going out of a fragment• Such a join graph is called a simple graph• The advantage of a design where the join relationship between fragments is simple is that the member and owner link can be allocated to one site and the joins between different pairs of fragments can proceed independently and in parallel
  24. 24. The Fragmentation with better join characteristics
  25. 25. The Fragmentation with better join characteristics• Unfortunately, obtaining simple join graphs may not always be possible• In that case the next desirable alternative is to have a design that results in a partitioned join graph• A partitioned graph consist of two or more sub- graphs with no links between them• Fragments so obtained may not be distributed for parallel execution as easily as those obtained via simple join graphs, but the allocation is still possible
  26. 26. The Fragmentation with better join characteristics• Let us continue with the distribution design of the database we started before• We already decided on the fragmentation of relation EMP according to the fragmentation of PAY• Lets now consider ASG, assume that there are two applications – The first application finds the names of engineers who work at certain places, it turns on all three sites and accesses the information about the engineer who work on local projects with higher probability than those of projects at other locations – At each administrative sites where employee records are managed, users would like to access the responsibilities on the projects that these employee work on and learn how they will work on those projects
  27. 27. The Fragmentation with better join characteristics• The first application results in a fragmentation of ASG according to the fragments PROJ1, PROJ3, PROJ4 and PROJ6 of PROJ obtained before
  28. 28. The Fragmentation with better join characteristics• Therefore, the derived fragmentation of ASG according to {PROJ1, PROJ3, PROJ4, PROJ6} is defined as:• The fragment instances are:
  29. 29. The Fragmentation with better join characteristics• The second query can be specified in SQL as:• Where i=1 or i=2, depending on the site where the query is issued• The derived fragmentation of ASG according to the fragmentation of EMP is defined as:
  30. 30. The Fragmentation with better join characteristics
  31. 31. The Fragmentation with better join characteristics• The example demonstrate two things: – Derived fragmentation may follow a chain where one relation is fragmented as a result of another one’s design and it, in turn, causes the fragmentation of another relation (PAY->EMP->ASG) – Typically, there will be more than one candidate fragmentation for a relation (ASG), the final choice of the fragmentation scheme may be a decision problem addressed during allocation
  32. 32. Checking of Correctness• We should now check the fragmentation algorithms discussed so far with respect to three correctness criteria – Completeness – Reconstruction – Disjointness
  33. 33. Completeness• The completeness of a primary horizontal fragmentation is based on the selection predicate used• As long as the selection predicates are complete, the resulting fragmentation is guaranteed to be complete as well
  34. 34. Completeness• The completeness of a derived horizontal fragmentation is somewhat more difficult to define• For example, there should be no ASG tuple which has a project number that is not also contained in PROJ, this rule is know as referential integrity
  35. 35. Reconstruction• Reconstruction of a global relation from its fragments is performed by the union operator in both the primary and the derived horizontal fragmentation• Thus for a relation R with fragmentation
  36. 36. Disjointness• It is easier to establish Disjointness of fragmentation for primary than for derived horizontal fragmentation• In PHF Disjointness is guaranteed as long as the minterm predicates determining the fragmentation are mutually exclusive
  37. 37. Example• In derived fragmentation, however, there is a semi join involved that adds considerable complexity• Disjointness can be guaranteed if the join graph is simple, otherwise it is necessary to investigate actual tuple values• In general we do not want a tuple of a member relation to join with two or more tuples of the owner relation when these tuples are in different fragments of the owner
  38. 38. Example• In fragmenting relation PAY, the minterm predicates M = {m1, m2} where m1: SAL<=30000 m2: SAL>30000• Since m1 and m2 are mutually exclusive, the fragmentation of PAY is disjoint• For relation EMP, however we require that – Each engineer has a single title – Each title have a single salary value associated with it• Since these two rules follow from the semantics of the database, the fragmentation of EMP with respect to PAY is also disjoint