SlideShare a Scribd company logo
1 of 48
1
Distributed Databases
CS347
Lecture 14
May 30, 2001
2
Topics for the Day
• Query processing in distributed databases
– Localization
– Distributed query operators
– Cost-based optimization
3
Query Processing Steps
• Decomposition
– Given SQL query, generate one or more algebraic
query trees
• Localization
– Rewrite query trees, replacing relations by fragments
• Optimization
– Given cost model + one or more localized query
trees
– Produce minimum cost query execution plan
4
Decomposition
• Same as in a centralized DBMS
• Normalization (usually into relational algebra)
Select A,C
From R Natural Join S
Where (R.B = 1 and S.D = 2) or (R.C > 3 and S.D = 2)
 (R.B = 1 v R.C > 3)  (S.D = 2)
R S
Conjunctive
normal
form
5
• Redundancy elimination
(S.A = 1)  (S.A > 5)  False
(S.A < 10)  (S.A < 5)  S.A < 5
• Algebraic Rewriting
– Example: pushing conditions down
Decomposition
S S
T T
cond
cond1 cond2
cond3
6
Localization Steps
1. Start with query tree
2. Replace relations by fragments
3. Push  up & , down (CS245 rules)
4. Simplify – eliminating unnecessary operations
Note: To denote fragments in query trees
[R: cond]
Relation that fragment belongs to Condition its tuples satisfy
7
Example 1
E=3
R
E=3

[R: E<10] [R: E  10]

E=3 E=3
[R: E<10] [R: E  10]
E=3
[R: E<10]
8
Example 2
R S
A
[R: A<5] [R: 5  A  10] [R: A>10] [S: A<5] [S: A  5]
 
A
R1 R2 R3 S1 S2
9
R1 S1 R1 S2 R2 S1 R2 S2 R3 S1 R3 S2

A
A A
A
A A

[R:A<5][S:A<5] [R:5A10] [S:A5] [R:A>10][S:A5]
A
A
A
10
Rules for Horiz. Fragmentation
• C1[R: C2]  [R: C1  C2]
• [R: False]  Ø
• [R: C1] [S: C2]  [R S: C1  C2  R.A = S.A]
• In Example 1:
E=3[R2: E  10]  [R2: E=3  E  10]
 [R2: False]  Ø
• In Example 2:
[R: A<5] [S: A  5]
 [R S: R.A < 5  S.A  5  R.A = S.A]
 [R S: False]  Ø
A A
A
A
A
11
Example 3 – Derived Fragmentation
R S
K
S’s fragmentation
is derived from
that of R.
[R: A<10] [R: A  10] [S: K=R.K  R.A<10] [S: K=R.K  R.A10]
 
K
R1 R2 S2
S1
12

K
K K
K
R1 S1 R1 S2 S1
S2
R2
R2

K
K
[R: A<10] [S: K=R.K  R.A<10] [R: A  10] [S: K=R.K  R.A  10]
13
Example 4 – Vertical Fragmentation
A
R
A
R1(K,A,B) R2(K,C,D)
K
A
K,A K,A
R1(K,A,B) R2(K,C,D)
K
A
R1(K,A,B)
14
Rule for Vertical Fragmentation
• Given vertical fragmentation of R(A):
Ri = Ai(R), Ai  A
• For any B  A:
B (R) = B [ Ri | B  Ai  Ø]
i
15
Parallel/Distributed Query Operations
• Sort
– Basic sort
– Range-partitioning sort
– Parallel external sort-merge
• Join
– Partitioned join
– Asymmetric fragment and replicate join
– General fragment and replicate join
– Semi-join programs
• Aggregation and duplicate removal
16
Parallel/distributed sort
• Input: relation R on
– single site/disk
– fragmented/partitioned by sort attribute
– fragmented/partitioned by some other attribute
• Output: sorted relation R
– single site/disk
– individual sorted fragments/partitions
17
Basic sort
• Given R(A,…) range partitioned on attribute A,
sort R on A
• Each fragment is sorted independently
• Results shipped elsewhere if necessary
7
3
11
17
14
27
22
10 20
3
7
11
14
17
22
27
10 20
18
Range partitioning sort
• Given R(A,….) located at one or more sites, not
fragmented on A, sort R on A
• Algorithm: range partition on A and then do basic sort
R1s
R2s
R3s
a0
Local sort
Local sort
Local sort
Ra
R1
R2
R3
Rb Result
a1
19
Selecting a partitioning vector
• Possible centralized approach using a “coordinator”
– Each site sends statistics about its fragment to coordinator
– Coordinator decides # of sites to use for local sort
– Coordinator computes and distributes partitioning vector
• For example,
– Statistics could be (min sort key, max sort key, # of tuples)
– Coordinator tries to choose vector that equally partitions
relation
20
Example
• Coordinator receives:
– From site 1: Min 5, Max 9, 10 tuples
– From site 2: Min 7, Max 16, 10 tuples
• Assume sort keys distributed uniformly within
[min,max] in each fragment
• Partition R into two fragments
5 10 15 20
k0
What is k0?
21
Variations
• Different kinds of statistics
– Local partitioning vector
– Histogram
• Multiple rounds between coordinator and sites
– Sites send statistics
– Coordinator computes and distributes initial vector V
– Sites tell coordinator the number of tuples that fall in
each range of V
– Coordinator computes final partitioning vector Vf
5 6 8 10
3 4 3
local vector
# of tuples
Site 1
22
Parallel external sort-merge
• Local sort
• Compute partition vector
• Merge sorted streams at final sites
Ra
Rb
Local sort
Local sort
Ras
Rbs
R1
R2
R3
a0
a1
In order
Result
23
Parallel/distributed join
Input: Relations R, S
May or may not be partitioned
Output: R S
Result at one or more sites
24
Partitioned Join
Local join
Result f(A)
Ra
Rb
R1
f(A)
R2
R3
S1
S2
S3
Sa
Sb
Note: Works only for equi-joins
Join attribute A
25
Partitioned Join
• Same partition function (f) for both relations
• f can be range or hash partitioning
• Any type of local join (nested-loop, hash, merge, etc.)
can be used
• Several possible scheduling options. Example:
– partition R; partition S; join
– partition R; build local hash table for R; partition S and join
• Good partition function important
– Distribute join load evenly among sites
26
Asymmetric fragment + replicate join
Local join
Result
Ra
Rb
R1
f
R2
R3
S Sa
Sb
Join attribute A
Partition function
union
• Any partition function f can be used (even round-robin)
• Can be used for any kind of join, not just equi-joins
S
S
27
General fragment + replicate join
Ra
Rb
R1
R2
Rn
Partition
R1
R2
Rn
Replicate
m copies
Sa
Sb
S1
S2
Sm
Partition
S1
S2
Sm
Replicate
n copies
28
All n x m pairings of
R,S fragments
R1 S1
R2 S1
Rn S1
R1 Sm
R2 Sm
Rn Sm
Result
•Asymmetric F+R join is a special case of general F+R.
•Asymmetric F+R is useful when S is small.
29
• Used to reduce communication traffic during join
processing
• R S = (R S) S
= R (S R)
= (R S) (S R)
Semi-join programs
30
Example
A(S) = [2,10,25,30]
b
a
2
10
c
d
25
30
y
x
3
10
z
w
15
25
32 x
A B A C
S R
R S =
w
y
10
25
Compute
S (R S)
• Using semi-join, communication cost = 4 A + 2 (A + C) + result
• Directly joining R and S, communication cost = 4 (A + B) + result
31
Comparing communication costs
• Say R is the smaller of the two relations R and S
• (R S) S is cheaper than R S if
size (AS) + size (R S) < size (R)
• Similar comparisons for other types of semi-joins
• Common implementation trick:
– Encode AS (or AR) as a bit vector
– 1 bit per domain of attribute A
0 0 1 1 0 1 0 0 0 0 1 0 1 0 0
32
n-way joins
• To compute R S T
– Semi-join program 1: R’ S’ T
where R’ = R S & S’ = S T
– Semi-join program 2: R’’ S’ T
where R’’ = R S’ & S’ = S T
– Several other options
• In general, number of options is exponential in
the number of relations
33
Other operations
• Duplicate elimination
– Sort first (in parallel), then eliminate duplicates in
the result
– Partition tuples (range or hash) and eliminate
duplicates locally
• Aggregates
– Partition by grouping attributes; compute
aggregates locally at each site
34
Example
# dept sal
1 toy 10
2 toy 20
3 sales 15
# dept sal
4 sales 5
5 toy 20
6 mgmt 15
7 sales 10
8 mgmt 30
sum(sal) group by dept
# dept sal
1 toy 10
2 toy 20
5 toy 20
6 mgmt 15
8 mgmt 30
# dept sal
3 sales 15
4 sales 5
7 sales 10
dept sum
toy 50
mgmt 45
dept sum
sales 30
sum
sum
Ra
Rb
35
Example
# dept sal
1 toy 10
2 toy 20
3 sales 15
# dept sal
4 sales 5
5 toy 20
6 mgmt 15
7 sales 10
8 mgmt 30
dept sum
toy 50
mgmt 45
dept sum
sales 30
sum
sum
dept sum
toy 30
toy 20
mgmt 45
dept sum
sales 15
sales 15
sum
sum
Ra
Rb
Aggregate during partitioning to reduce communication cost
Does this work for all
kinds of aggregates?
36
Query Optimization
• Generate query execution plans (QEPs)
• Estimate cost of each QEP ($,time,…)
• Choose minimum cost QEP
• What’s different for distributed DB?
– New strategies for some operations (semi-join,
range-partitioning sort,…)
– Many ways to assign and schedule processors
– Some factors besides number of IO’s in the cost
model
37
Cost estimation
• In centralized systems - estimate sizes of
intermediate relations
• For distributed systems
– Transmission cost/time may dominate
– Account for parallelism
– Data distribution and result re-assembly cost/time
Work
at site
Work
at site
T1 T2 answer
100 IOs
50 IOs
70 IOs
20 IOs
Plan A
Plan B
38
Optimization in distributed DBs
• Two levels of optimization
• Global optimization
– Given localized query and cost function
– Output optimized (min. cost) QEP that includes
relational and communication operations on
fragments
• Local optimization
– At each site involved in query execution
– Portion of the QEP at a given site optimized using
techniques from centralized DB systems
39
Search strategies
1. Exhaustive (with pruning)
2. Hill climbing (greedy)
3. Query separation
40
• A fixed set of techniques for each relational
operator
• Search space = “all” possible QEPs with this set
of techniques
• Prune search space using heuristics
• Choose minimum cost QEP from rest of search
space
Exhaustive with Pruning
41
R S T
R S RT S R S T T S TR
(S R) T (T S) R
|R|>|S|>|T| R S T
A B
2 1 2 1
2
Prune because cross-product not necessary
Prune because larger relation first
1
Ship S
to R
Semi-join Ship
T to S
Semi-join
Example
42
Hill Climbing
• Begin with initial feasible QEP
• At each step, generate a set S of new QEPs by applying
‘transformations’ to current QEP
• Evaluate cost of each QEP in S
• Stop if no improvement is possible
• Otherwise, replace current QEP by the minimum cost
QEP from S and iterate
x
Initial plan
1
2
43
R S T V
A B C
Example
R S T V
R 1 10
S 2 20
T 3 30
V 4 40
Rel. Site # of tuples
• Goal: minimize communication
cost
• Initial plan: send all relations to
one site
To site 1: cost=20+30+40= 90
To site 2: cost=10+30+40= 80
To site 3: cost=10+20+40= 70
To site 4: cost=10+20+30= 60
• Transformation: send a relation
to its neighbor
44
• Initial feasible plan
P0: R (1  4); S (2  4); T (3  4)
Compute join at site 4
• Assume following sizes: R S  20
S T  5
T V  1
Local search
45
4
1 2
S
R
10 20
cost = 30
4
1 2
R S
R
10
20 cost = 30
4
1 2
S R
S
20
20
cost = 40
No change
Worse
46
4
2 3
T
S
30
20
cost = 50
T
4
2 3
T S
30
5
cost = 35
4
2 3
S T
S
20
5
cost = 25
Improvement
Improvement
47
• P1: S (2  3); R (1  4);  (3  4)
where  = S T
Compute answer at site 4
• Now apply same transformation to R and 
Next iteration
4
1 3
R 
4
1 3

 R
4
1 3
R
R 
48
Resources
• Ozsu and Valduriez. “Principles of Distributed
Database Systems” – Chapters 7, 8, and 9.

More Related Content

Similar to lecture14.ppt

Travelling Salesman Problem, Robotics & Inverse Kinematics
Travelling Salesman Problem, Robotics & Inverse KinematicsTravelling Salesman Problem, Robotics & Inverse Kinematics
Travelling Salesman Problem, Robotics & Inverse Kinematicsmcoond
 
Unit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptxUnit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptxssuser01e301
 
dsp unit4-vrc 2222222.pptx important to read
dsp unit4-vrc 2222222.pptx important to readdsp unit4-vrc 2222222.pptx important to read
dsp unit4-vrc 2222222.pptx important to readSahithikairamkonda
 
Adbms 40 heuristics in query optimization
Adbms 40 heuristics in query optimizationAdbms 40 heuristics in query optimization
Adbms 40 heuristics in query optimizationVaibhav Khanna
 
DFA minimization algorithms in map reduce
DFA minimization algorithms in map reduceDFA minimization algorithms in map reduce
DFA minimization algorithms in map reduceIraj Hedayati
 
Introduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics ResearchersIntroduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics ResearchersVitomir Kovanovic
 
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...Austin Benson
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsJonny Daenen
 
Path based Algorithms(Term Paper)
Path based Algorithms(Term Paper)Path based Algorithms(Term Paper)
Path based Algorithms(Term Paper)pankaj kumar
 
introduction to data structures and types
introduction to data structures and typesintroduction to data structures and types
introduction to data structures and typesankita946617
 
Distributed Query Processing
Distributed Query ProcessingDistributed Query Processing
Distributed Query ProcessingMythili Kannan
 
lefg sdfg ssdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg d...
lefg sdfg ssdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg d...lefg sdfg ssdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg d...
lefg sdfg ssdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg d...RajuNestham1
 
3.ASSEMBLERS.pptx
3.ASSEMBLERS.pptx3.ASSEMBLERS.pptx
3.ASSEMBLERS.pptxGaganaP13
 

Similar to lecture14.ppt (20)

Travelling Salesman Problem, Robotics & Inverse Kinematics
Travelling Salesman Problem, Robotics & Inverse KinematicsTravelling Salesman Problem, Robotics & Inverse Kinematics
Travelling Salesman Problem, Robotics & Inverse Kinematics
 
Unit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptxUnit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptx
 
Relational Calculus
Relational CalculusRelational Calculus
Relational Calculus
 
dsp unit4-vrc 2222222.pptx important to read
dsp unit4-vrc 2222222.pptx important to readdsp unit4-vrc 2222222.pptx important to read
dsp unit4-vrc 2222222.pptx important to read
 
Adbms 40 heuristics in query optimization
Adbms 40 heuristics in query optimizationAdbms 40 heuristics in query optimization
Adbms 40 heuristics in query optimization
 
DFA minimization algorithms in map reduce
DFA minimization algorithms in map reduceDFA minimization algorithms in map reduce
DFA minimization algorithms in map reduce
 
Introduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics ResearchersIntroduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics Researchers
 
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
 
Ch7
Ch7Ch7
Ch7
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-Joins
 
Lec36
Lec36Lec36
Lec36
 
Big datacourse
Big datacourseBig datacourse
Big datacourse
 
Path based Algorithms(Term Paper)
Path based Algorithms(Term Paper)Path based Algorithms(Term Paper)
Path based Algorithms(Term Paper)
 
introduction to data structures and types
introduction to data structures and typesintroduction to data structures and types
introduction to data structures and types
 
12. Linear models
12. Linear models12. Linear models
12. Linear models
 
Distributed Query Processing
Distributed Query ProcessingDistributed Query Processing
Distributed Query Processing
 
lefg sdfg ssdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg d...
lefg sdfg ssdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg d...lefg sdfg ssdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg d...
lefg sdfg ssdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg d...
 
lecture8Alg.ppt
lecture8Alg.pptlecture8Alg.ppt
lecture8Alg.ppt
 
MUMS: Transition & SPUQ Workshop - Gradient-Free Construction of Active Subsp...
MUMS: Transition & SPUQ Workshop - Gradient-Free Construction of Active Subsp...MUMS: Transition & SPUQ Workshop - Gradient-Free Construction of Active Subsp...
MUMS: Transition & SPUQ Workshop - Gradient-Free Construction of Active Subsp...
 
3.ASSEMBLERS.pptx
3.ASSEMBLERS.pptx3.ASSEMBLERS.pptx
3.ASSEMBLERS.pptx
 

Recently uploaded

Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 

Recently uploaded (20)

Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 

lecture14.ppt

  • 2. 2 Topics for the Day • Query processing in distributed databases – Localization – Distributed query operators – Cost-based optimization
  • 3. 3 Query Processing Steps • Decomposition – Given SQL query, generate one or more algebraic query trees • Localization – Rewrite query trees, replacing relations by fragments • Optimization – Given cost model + one or more localized query trees – Produce minimum cost query execution plan
  • 4. 4 Decomposition • Same as in a centralized DBMS • Normalization (usually into relational algebra) Select A,C From R Natural Join S Where (R.B = 1 and S.D = 2) or (R.C > 3 and S.D = 2)  (R.B = 1 v R.C > 3)  (S.D = 2) R S Conjunctive normal form
  • 5. 5 • Redundancy elimination (S.A = 1)  (S.A > 5)  False (S.A < 10)  (S.A < 5)  S.A < 5 • Algebraic Rewriting – Example: pushing conditions down Decomposition S S T T cond cond1 cond2 cond3
  • 6. 6 Localization Steps 1. Start with query tree 2. Replace relations by fragments 3. Push  up & , down (CS245 rules) 4. Simplify – eliminating unnecessary operations Note: To denote fragments in query trees [R: cond] Relation that fragment belongs to Condition its tuples satisfy
  • 7. 7 Example 1 E=3 R E=3  [R: E<10] [R: E  10]  E=3 E=3 [R: E<10] [R: E  10] E=3 [R: E<10]
  • 8. 8 Example 2 R S A [R: A<5] [R: 5  A  10] [R: A>10] [S: A<5] [S: A  5]   A R1 R2 R3 S1 S2
  • 9. 9 R1 S1 R1 S2 R2 S1 R2 S2 R3 S1 R3 S2  A A A A A A  [R:A<5][S:A<5] [R:5A10] [S:A5] [R:A>10][S:A5] A A A
  • 10. 10 Rules for Horiz. Fragmentation • C1[R: C2]  [R: C1  C2] • [R: False]  Ø • [R: C1] [S: C2]  [R S: C1  C2  R.A = S.A] • In Example 1: E=3[R2: E  10]  [R2: E=3  E  10]  [R2: False]  Ø • In Example 2: [R: A<5] [S: A  5]  [R S: R.A < 5  S.A  5  R.A = S.A]  [R S: False]  Ø A A A A A
  • 11. 11 Example 3 – Derived Fragmentation R S K S’s fragmentation is derived from that of R. [R: A<10] [R: A  10] [S: K=R.K  R.A<10] [S: K=R.K  R.A10]   K R1 R2 S2 S1
  • 12. 12  K K K K R1 S1 R1 S2 S1 S2 R2 R2  K K [R: A<10] [S: K=R.K  R.A<10] [R: A  10] [S: K=R.K  R.A  10]
  • 13. 13 Example 4 – Vertical Fragmentation A R A R1(K,A,B) R2(K,C,D) K A K,A K,A R1(K,A,B) R2(K,C,D) K A R1(K,A,B)
  • 14. 14 Rule for Vertical Fragmentation • Given vertical fragmentation of R(A): Ri = Ai(R), Ai  A • For any B  A: B (R) = B [ Ri | B  Ai  Ø] i
  • 15. 15 Parallel/Distributed Query Operations • Sort – Basic sort – Range-partitioning sort – Parallel external sort-merge • Join – Partitioned join – Asymmetric fragment and replicate join – General fragment and replicate join – Semi-join programs • Aggregation and duplicate removal
  • 16. 16 Parallel/distributed sort • Input: relation R on – single site/disk – fragmented/partitioned by sort attribute – fragmented/partitioned by some other attribute • Output: sorted relation R – single site/disk – individual sorted fragments/partitions
  • 17. 17 Basic sort • Given R(A,…) range partitioned on attribute A, sort R on A • Each fragment is sorted independently • Results shipped elsewhere if necessary 7 3 11 17 14 27 22 10 20 3 7 11 14 17 22 27 10 20
  • 18. 18 Range partitioning sort • Given R(A,….) located at one or more sites, not fragmented on A, sort R on A • Algorithm: range partition on A and then do basic sort R1s R2s R3s a0 Local sort Local sort Local sort Ra R1 R2 R3 Rb Result a1
  • 19. 19 Selecting a partitioning vector • Possible centralized approach using a “coordinator” – Each site sends statistics about its fragment to coordinator – Coordinator decides # of sites to use for local sort – Coordinator computes and distributes partitioning vector • For example, – Statistics could be (min sort key, max sort key, # of tuples) – Coordinator tries to choose vector that equally partitions relation
  • 20. 20 Example • Coordinator receives: – From site 1: Min 5, Max 9, 10 tuples – From site 2: Min 7, Max 16, 10 tuples • Assume sort keys distributed uniformly within [min,max] in each fragment • Partition R into two fragments 5 10 15 20 k0 What is k0?
  • 21. 21 Variations • Different kinds of statistics – Local partitioning vector – Histogram • Multiple rounds between coordinator and sites – Sites send statistics – Coordinator computes and distributes initial vector V – Sites tell coordinator the number of tuples that fall in each range of V – Coordinator computes final partitioning vector Vf 5 6 8 10 3 4 3 local vector # of tuples Site 1
  • 22. 22 Parallel external sort-merge • Local sort • Compute partition vector • Merge sorted streams at final sites Ra Rb Local sort Local sort Ras Rbs R1 R2 R3 a0 a1 In order Result
  • 23. 23 Parallel/distributed join Input: Relations R, S May or may not be partitioned Output: R S Result at one or more sites
  • 24. 24 Partitioned Join Local join Result f(A) Ra Rb R1 f(A) R2 R3 S1 S2 S3 Sa Sb Note: Works only for equi-joins Join attribute A
  • 25. 25 Partitioned Join • Same partition function (f) for both relations • f can be range or hash partitioning • Any type of local join (nested-loop, hash, merge, etc.) can be used • Several possible scheduling options. Example: – partition R; partition S; join – partition R; build local hash table for R; partition S and join • Good partition function important – Distribute join load evenly among sites
  • 26. 26 Asymmetric fragment + replicate join Local join Result Ra Rb R1 f R2 R3 S Sa Sb Join attribute A Partition function union • Any partition function f can be used (even round-robin) • Can be used for any kind of join, not just equi-joins S S
  • 27. 27 General fragment + replicate join Ra Rb R1 R2 Rn Partition R1 R2 Rn Replicate m copies Sa Sb S1 S2 Sm Partition S1 S2 Sm Replicate n copies
  • 28. 28 All n x m pairings of R,S fragments R1 S1 R2 S1 Rn S1 R1 Sm R2 Sm Rn Sm Result •Asymmetric F+R join is a special case of general F+R. •Asymmetric F+R is useful when S is small.
  • 29. 29 • Used to reduce communication traffic during join processing • R S = (R S) S = R (S R) = (R S) (S R) Semi-join programs
  • 30. 30 Example A(S) = [2,10,25,30] b a 2 10 c d 25 30 y x 3 10 z w 15 25 32 x A B A C S R R S = w y 10 25 Compute S (R S) • Using semi-join, communication cost = 4 A + 2 (A + C) + result • Directly joining R and S, communication cost = 4 (A + B) + result
  • 31. 31 Comparing communication costs • Say R is the smaller of the two relations R and S • (R S) S is cheaper than R S if size (AS) + size (R S) < size (R) • Similar comparisons for other types of semi-joins • Common implementation trick: – Encode AS (or AR) as a bit vector – 1 bit per domain of attribute A 0 0 1 1 0 1 0 0 0 0 1 0 1 0 0
  • 32. 32 n-way joins • To compute R S T – Semi-join program 1: R’ S’ T where R’ = R S & S’ = S T – Semi-join program 2: R’’ S’ T where R’’ = R S’ & S’ = S T – Several other options • In general, number of options is exponential in the number of relations
  • 33. 33 Other operations • Duplicate elimination – Sort first (in parallel), then eliminate duplicates in the result – Partition tuples (range or hash) and eliminate duplicates locally • Aggregates – Partition by grouping attributes; compute aggregates locally at each site
  • 34. 34 Example # dept sal 1 toy 10 2 toy 20 3 sales 15 # dept sal 4 sales 5 5 toy 20 6 mgmt 15 7 sales 10 8 mgmt 30 sum(sal) group by dept # dept sal 1 toy 10 2 toy 20 5 toy 20 6 mgmt 15 8 mgmt 30 # dept sal 3 sales 15 4 sales 5 7 sales 10 dept sum toy 50 mgmt 45 dept sum sales 30 sum sum Ra Rb
  • 35. 35 Example # dept sal 1 toy 10 2 toy 20 3 sales 15 # dept sal 4 sales 5 5 toy 20 6 mgmt 15 7 sales 10 8 mgmt 30 dept sum toy 50 mgmt 45 dept sum sales 30 sum sum dept sum toy 30 toy 20 mgmt 45 dept sum sales 15 sales 15 sum sum Ra Rb Aggregate during partitioning to reduce communication cost Does this work for all kinds of aggregates?
  • 36. 36 Query Optimization • Generate query execution plans (QEPs) • Estimate cost of each QEP ($,time,…) • Choose minimum cost QEP • What’s different for distributed DB? – New strategies for some operations (semi-join, range-partitioning sort,…) – Many ways to assign and schedule processors – Some factors besides number of IO’s in the cost model
  • 37. 37 Cost estimation • In centralized systems - estimate sizes of intermediate relations • For distributed systems – Transmission cost/time may dominate – Account for parallelism – Data distribution and result re-assembly cost/time Work at site Work at site T1 T2 answer 100 IOs 50 IOs 70 IOs 20 IOs Plan A Plan B
  • 38. 38 Optimization in distributed DBs • Two levels of optimization • Global optimization – Given localized query and cost function – Output optimized (min. cost) QEP that includes relational and communication operations on fragments • Local optimization – At each site involved in query execution – Portion of the QEP at a given site optimized using techniques from centralized DB systems
  • 39. 39 Search strategies 1. Exhaustive (with pruning) 2. Hill climbing (greedy) 3. Query separation
  • 40. 40 • A fixed set of techniques for each relational operator • Search space = “all” possible QEPs with this set of techniques • Prune search space using heuristics • Choose minimum cost QEP from rest of search space Exhaustive with Pruning
  • 41. 41 R S T R S RT S R S T T S TR (S R) T (T S) R |R|>|S|>|T| R S T A B 2 1 2 1 2 Prune because cross-product not necessary Prune because larger relation first 1 Ship S to R Semi-join Ship T to S Semi-join Example
  • 42. 42 Hill Climbing • Begin with initial feasible QEP • At each step, generate a set S of new QEPs by applying ‘transformations’ to current QEP • Evaluate cost of each QEP in S • Stop if no improvement is possible • Otherwise, replace current QEP by the minimum cost QEP from S and iterate x Initial plan 1 2
  • 43. 43 R S T V A B C Example R S T V R 1 10 S 2 20 T 3 30 V 4 40 Rel. Site # of tuples • Goal: minimize communication cost • Initial plan: send all relations to one site To site 1: cost=20+30+40= 90 To site 2: cost=10+30+40= 80 To site 3: cost=10+20+40= 70 To site 4: cost=10+20+30= 60 • Transformation: send a relation to its neighbor
  • 44. 44 • Initial feasible plan P0: R (1  4); S (2  4); T (3  4) Compute join at site 4 • Assume following sizes: R S  20 S T  5 T V  1 Local search
  • 45. 45 4 1 2 S R 10 20 cost = 30 4 1 2 R S R 10 20 cost = 30 4 1 2 S R S 20 20 cost = 40 No change Worse
  • 46. 46 4 2 3 T S 30 20 cost = 50 T 4 2 3 T S 30 5 cost = 35 4 2 3 S T S 20 5 cost = 25 Improvement Improvement
  • 47. 47 • P1: S (2  3); R (1  4);  (3  4) where  = S T Compute answer at site 4 • Now apply same transformation to R and  Next iteration 4 1 3 R  4 1 3   R 4 1 3 R R 
  • 48. 48 Resources • Ozsu and Valduriez. “Principles of Distributed Database Systems” – Chapters 7, 8, and 9.