SlideShare a Scribd company logo
DBMS 2001 Notes 6: Query Compilation 1
Principles of Database
Management Systems
6: Query Compilation and
Optimization
Pekka Kilpeläinen
(partially based on Stanford CS245 slide
originals by Hector Garcia-Molina, Jeff Ullman
and Jennifer Widom)
DBMS 2001 Notes 6: Query Compilation 2
Overview
• We have studied recently:
– Algorithms for selections and joins, and their costs
(in terms of disk I/O)
• Next: A closer look at query compilation
– Parsing
– Algebraic optimization of logical query plans
– Estimating sizes of intermediate results
• Focus still on selections and joins
• Remember the overall process of query
execution:
DBMS 2001 Notes 6: Query Compilation 3
{P1,P2,…..}
{P1,C1>...}
parse
convert
apply laws
estimate result sizes
consider physical plans estimate costs
pick best
execute
Pi
answer
SQL query
parse tree
logical query plan
“improved” l.q.p
l.q.p. +sizes
statistics
DBMS 2001 Notes 6: Query Compilation 4
Step 1: Parsing
• Check syntactic correctness of the query, and
transform it into a parse tree
– Based on the formal grammar for SQL
– Inner nodes nonterminal symbols (syntactic
categories for things like <Query>, <RelName>, or
<Condition>)
– Leaves terminal symbols: names of relations or
attributes, keywords (SELECT, FROM, …),
operators (+, AND, OR, LIKE, …), operands (10,
'%1960', …)
– Also check semantic correctness: Relations and
attributes exist, operands compatible with
operators, …
DBMS 2001 Notes 6: Query Compilation 5
Example: SQL query
Consider querying a movie database with relations
StarsIn(title, year, starName)
MovieStar(name, address, gender, birthdate)
SELECT title
FROM StarsIn, MovieStar
WHERE starName = name AND birthdate LIKE ‘%1960’ ;
(Find the titles for movies with stars born in 1960)
DBMS 2001 Notes 6: Query Compilation 6
Example: Parse Tree
<Query>
<SFW>
SELECT <SelList> FROM <FromList> WHERE <Condition>
<Attribute> <RelName> , <FromList> AND
title StarsIn <RelName>
<Condition> <Condition>
<Attribute> = <Attribute> <Attribute> LIKE <Pattern>
starName name birthdate ‘%1960’
MovieStar
DBMS 2001 Notes 6: Query Compilation 7
Step 2:
Parse Tree −> Logical Query
Plan• Basic strategy:
SELECT A, B, C
FROM R1, R2
WHERE Cond ;
becomes
πA,B,C[σCond(R1 x R2)]
DBMS 2001 Notes 6: Query Compilation 8
Example: Logical Query Plan
Πtitle
σstarName=name and birthdate LIKE ‘%1960’
StarsIn MovieStar
×
DBMS 2001 Notes 6: Query Compilation 9
Step 3: Improving the L.Q.P
• Transform the logical query plan into an
equivalent form expected to lead to
better performance
– Based on laws of relational algebra
– Normal to produce a single optimized form,
which acts as input for the generation of
physical query plans
DBMS 2001 Notes 6: Query Compilation 10
Example: Improved Logical Query Plan
Πtitle
starName=name
StarsIn
σbirthdate LIKE ‘%1960’
MovieStar
DBMS 2001 Notes 6: Query Compilation 11
Step 4: Estimate Result Sizes
• Cost estimates of database algorithms
depend on sizes of input relations
−> Need to estimate sizes of
intermediate results
– Estimates based on statistics about
relations, gathered periodically or
incrementally
DBMS 2001 Notes 6: Query Compilation 12
Example: Estimate Result Sizes
Need expected size
StarsIn
MovieStar
σ
DBMS 2001 Notes 6: Query Compilation 13
Steps 5, 6, ...
• Generate and compare query plans
– generate alternate query plans P1, …, Pk by
selecting algorithms and execution orders
for relational operations
– Estimate the cost of each plan
– Choose the plan Pi estimated to be "best"
• Execute plan Pi and return its result
DBMS 2001 Notes 6: Query Compilation 14
Example: One Physical Plan
Parameters: join order,
memory size, project attributes,...
Hash join
SEQ scan index scan Parameters:
Select Condition,...
StarsIn MovieStar
DBMS 2001 Notes 6: Query Compilation 15
Next: Closer look at ...
• Transformation rules
• Estimating result sizes
DBMS 2001 Notes 6: Query Compilation 16
Relational algebra optimization
• Transformation rules
(preserve equivalence)
• What are good transformations?
DBMS 2001 Notes 6: Query Compilation 17
Rules: Natural joins
R S = S R (commutative)
(R S) T = R (S T) (associative)
• Carry attribute names in results, so order
is not important
−> Can evaluate in any order
DBMS 2001 Notes 6: Query Compilation 18
(R x S) x T = R x (S x T)
R x S = S x R
R U (S U T) = (R U S) U T
R U S = S U R
Rules:
Cross products & union similarly
(both associative & commutative):
DBMS 2001 Notes 6: Query Compilation 19
Rules: Selects
1. σp1∧p2(R) =
2. σp1vp2(R) =
σp1 [ σp2 (R)]
[ σp1 (R)] U [ σp2 (R)]
1. Especially useful (applied left-to-right):
Allows compound select conditions to be
split and moved to suitable positions
(See next)
NB: ∧ = AND; v = OR
DBMS 2001 Notes 6: Query Compilation 20
Rules: Products to Joins
Definition of Natural Join:
R S = πL[σC(R x S)] ;
Condition C equates attributes common to R
and S, and πL projects one copy of them out
Applied right-to-left
- definition of general join applied similarly
DBMS 2001 Notes 6: Query Compilation 21
Let p = predicate with only R attribs
q = predicate with only S attribs
m = predicate with attribs common to R,S
σp (R S) =
σq (R S) =
Rules: σ + combined
[σp (R)] S
R [σq (S)]
More rules can be derived...
DBMS 2001 Notes 6: Query Compilation 22
σp∧q (R S) = [σp (R)] [σq (S)]
σp∧q∧m (R S) =
σm [(σp R) (σq S)]
σpvq (R S) =
[(σp R) S] U [R (σq S)]
Derived rules for σ +
DBMS 2001 Notes 6: Query Compilation 23
Derivation for first one;
Others for homework:
σp∧q (R S) =
σp [σq (R S) ] =
σp [(R σq (S) ] =
[σp (R)] [σq (S)]
DBMS 2001 Notes 6: Query Compilation 24
Rules for σ, π combined with X
similar...
e.g., σp (R X S) = ?
DBMS 2001 Notes 6: Query Compilation 25
σp1∧p2 (R) → σp1 [σp2 (R)]
σp (R S) → [σp (R)] S
R S → S R
Some “good” transformations:
• No transformation is always good
• Usually good: early selections
DBMS 2001 Notes 6: Query Compilation 26
Outline - Query Processing
• Relational algebra level
– transformations
– good transformations
• Detailed query plan level
– estimate costs
– generate and compare plans
next
DBMS 2001 Notes 6: Query Compilation 27
• Estimating cost of query plan
(1) Estimating size of intermediate results
−>
(2) Estimating # of I/Os
(considered last week)
DBMS 2001 Notes 6: Query Compilation 28
Estimating result size
• Maintain statistics for relation R
– T(R) : # tuples in R
– L(R) : Length of rows,
# of bytes in each R tuple
– B(R): # of blocks to hold all R tuples
– V(R, A) :
# distinct values in R for attribute A
DBMS 2001 Notes 6: Query Compilation 29
Example
R A: 20 byte string
B: 4 byte integer
C: 8 byte date
D: 5 byte string
A B C D
cat 1 10 a
cat 1 20 b
dog 1 30 a
dog 1 40 c
bat 1 50 d
T(R) = 5 L(R) = 37
V(R,A) = 3 V(R,C) = 5
V(R,B) = 1 V(R,D) = 4
DBMS 2001 Notes 6: Query Compilation 30
Size estimates for product W = R1xR2
T(W) =
L(W) =
T(R1) × T(R2)
L(R1) + L(R2)
DBMS 2001 Notes 6: Query Compilation 31
L(W) = L(R)
T(W) = ?
Estimates for selection W = σA=a (R)
= AVG number of tuples that satisfy
an equality condition on R.A
= …
DBMS 2001 Notes 6: Query Compilation 32
Example
R V(R,A)=3
V(R,B)=1
V(R,C)=5
V(R,D)=4
AVG size of, say, σA=val(R)?
T(σA=cat(R))=2, T(σA=dog(R))=2, T(σA=bat(R))=1
=> (2+2+1)/3 = 5/3
A B C D
cat 1 10 a
cat 1 20 b
dog 1 30 a
dog 1 40 c
bat 1 50 d
DBMS 2001 Notes 6: Query Compilation 33
Size of W = σZ=a(R) in general:
Assume: Only existing Z values a1, a2, …
are used in a select expression Z= ai,
each with equal probalility 1/V(R,Z)
=> the expected size of σZ=val(R) is
E(T(W)) = Σi 1/V(R,Z) * T(σZ=a
i (R))
= T(R)/V(R,Z)
DBMS 2001 Notes 6: Query Compilation 34
What about selection W=σz ≥ val (R) ?
T(W) = ?
• Estimate 1: T(W) = T(R)/2
– Rationale: On avg, 1/2 of tuples satisfy the condition
• Estimate 2: T(W) = T(R)/3
– Rationale: Acknowledges the tendency of
selecting "interesting" (e.g., rare tuples) more
frequently
<, ≤ and < similary
DBMS 2001 Notes 6: Query Compilation 35
Size estimates for W = R1 R2
Let X = attributes of R1
Y = attributes of R2
X ∩ Y = ∅
Same as R1 x R2
Case 1
DBMS 2001 Notes 6: Query Compilation 36
W = R1 R2 X ∩ Y = A
R1 A B C R2 A D
Case 2
Assumption:
V(R1,A) ≤ V(R2,A) ⇒ πA(R1) ⊆ πA(R2)
V(R2,A) ≤ V(R1,A) ⇒ πA(R2) ⊆ πA(R1)
“Containment of value sets” [Sec. 7.4.4]
DBMS 2001 Notes 6: Query Compilation 37
Why should “containment of value sets” hold?
Consider joining relations
Faculties(FName, …) and
Depts(DName, FName, …)
where Depts.FName is a foreign key, and
faculties can have 0,…,n departments.
Now V(Depts, FName) ≤ V(Faculties,
FName), and referential integrity requires that
πFName(Depts) ⊆ πFName(Faculties)
DBMS 2001 Notes 6: Query Compilation 38
R1 A B C R2 A D
a
Estimating T(W) when V(R1,A) ≤ V(R2,A)
Take
1 tuple
Match
Each estimated to match with T(R2)/V(R2,A)
tuples ...
so T(W) = T(R1)×T(R2)/V(R2, A)
DBMS 2001 Notes 6: Query Compilation 39
• V(R1,A) ≤ V(R2,A): T(W) = T(R1)T(R2)/V(R2,A)
• V(R2,A) ≤ V(R1,A): T(W) = T(R2)T(R1)/V(R1,A)
[A is the join attribute]
=> in general:
T(W) = T(R1)T(R2)/max{V(R1,A), V(R2,A) }
DBMS 2001 Notes 6: Query Compilation 40
With similar ideas, can estimate sizes of:
• W = ΠAB (R) ….. [Sec. 7.4.2]
• W = σA=a∧B=b (R) = σA=a (σB=b (R));
Ass. A and B independent =>
T(W) = T(R)/(V(R, A) x V(R, B))
• W=R(A,X,Y) S(X,Y,B); [Sec. 7.4.5]
Ass. X and Y independent => …
T(W) = T(R)T(S)/(max{V(R,X), V(S,X)} x
max{V(R,Y), V(S,Y)})
• Union, intersection, diff, …. [Sec. 7.4.7]
DBMS 2001 Notes 6: Query Compilation 41
Note: for complex expressions, need
intermediate estimates.
E.g. W = [σA=a (R1) ] R2
Treat as relation U
T(U) = T(R1)/V(R1,A) L(U) = L(R1)
Also need an estimate for V(U, Ai) !
DBMS 2001 Notes 6: Query Compilation 42
To estimate V (U, Ai)
E.g., U = σA=a (R)
Say R has attributes A,B,C,D
V(U, A) = ?
V(U, B) = ?
V(U, C) = ?
V(U, D) = ?
DBMS 2001 Notes 6: Query Compilation 43
Example
R V(R,A)=3
V(R,B)=1
V(R,C)=T(R)=5
V(R,D)=3
U = σA=x (R)
A B C D
cat 1 10 10
cat 1 20 20
dog 1 30 10
dog 1 40 30
bat 1 50 10
V(U, D) = V(R,D)/T(R,A) ... V(R,D)
V(U,A) =1 V(U,B) =1 V(U,C) = T(R)
V(R,A)
DBMS 2001 Notes 6: Query Compilation 44
V(U,Ai) for Joins U = R1(A,B) R2(A,C)
V(U,A) = min { V(R1, A), V(R2, A) }
V(U,B) = V(R1, B)
V(U,C) = V(R2, C)
[Assumption called
“preservation of value sets”, sect. 7.4.4]
Values of non-join
attributes preserved
DBMS 2001 Notes 6: Query Compilation 45
Example:
Z = R1(A,B) R2(B,C) R3(C,D)
• R1: T(R1) = 1000 V(R1,A)=50 V(R1,B)=100
• R2: T(R2) = 2000 V(R2,B)=200 V(R2,C)=300
• R3: T(R3) = 3000 V(R3,C)=90 V(R3,D)=500
DBMS 2001 Notes 6: Query Compilation 46
T(U) = T(R1) ×T(R2)/max{V(R1,B), V(R2,B)}
= 1000×2000/200 = 10,000;
V(U,A) = V(R1,A) = 50
V(U,C) = V(R2,C) = 300
V(U,B) = min{V(R1,B), V(R2,B)}= 100
Intermediate Result:
U(A,B,C)= R1(A,B) R2(B,C)
DBMS 2001 Notes 6: Query Compilation 47
Z = U(A,B,C) R3(C,D)
T(Z) = T(U)×T(R3)/ max{V(U,C), V(R3,C)}
= 10,000×3000/300
= 100,000
V(Z,A) = V(U,A) = 50
V(Z,B) = V(U,B) = 100
V(Z,C) = min{V(U,C), V(R3,C)} = 90
V(Z,D) = V(R3,D) =500
DBMS 2001 Notes 6: Query Compilation 48
Outline/Summary
• Estimating cost of query plan
– Estimating size of results done!
– Estimating # of IOs last week
• Generate and compare plans
– skip this
• Execute physical operations of query plan
– Sketched last week

More Related Content

What's hot

2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
DB Tsai
 
Lecture 10 data structures and algorithms
Lecture 10 data structures and algorithmsLecture 10 data structures and algorithms
Lecture 10 data structures and algorithmsAakash deep Singhal
 
Efficient Hill Climber for Multi-Objective Pseudo-Boolean Optimization
Efficient Hill Climber for Multi-Objective Pseudo-Boolean OptimizationEfficient Hill Climber for Multi-Objective Pseudo-Boolean Optimization
Efficient Hill Climber for Multi-Objective Pseudo-Boolean Optimization
jfrchicanog
 
Spark summit talk, july 2014 powered by reveal
Spark summit talk, july 2014 powered by revealSpark summit talk, july 2014 powered by reveal
Spark summit talk, july 2014 powered by reveal
Debasish Das
 
Advance control theory
Advance control theoryAdvance control theory
Advance control theory
SHIMI S L
 
[Www.pkbulk.blogspot.com]dbms13
[Www.pkbulk.blogspot.com]dbms13[Www.pkbulk.blogspot.com]dbms13
[Www.pkbulk.blogspot.com]dbms13
AnusAhmad
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
Krish_ver2
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query ExecutionJ Singh
 

What's hot (9)

2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
 
Lecture 10 data structures and algorithms
Lecture 10 data structures and algorithmsLecture 10 data structures and algorithms
Lecture 10 data structures and algorithms
 
Efficient Hill Climber for Multi-Objective Pseudo-Boolean Optimization
Efficient Hill Climber for Multi-Objective Pseudo-Boolean OptimizationEfficient Hill Climber for Multi-Objective Pseudo-Boolean Optimization
Efficient Hill Climber for Multi-Objective Pseudo-Boolean Optimization
 
Spark summit talk, july 2014 powered by reveal
Spark summit talk, july 2014 powered by revealSpark summit talk, july 2014 powered by reveal
Spark summit talk, july 2014 powered by reveal
 
Advance control theory
Advance control theoryAdvance control theory
Advance control theory
 
[Www.pkbulk.blogspot.com]dbms13
[Www.pkbulk.blogspot.com]dbms13[Www.pkbulk.blogspot.com]dbms13
[Www.pkbulk.blogspot.com]dbms13
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
Lecture23
Lecture23Lecture23
Lecture23
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
 

Viewers also liked

Relazioni brixen 2016
Relazioni brixen 2016Relazioni brixen 2016
Relazioni brixen 2016
IstitutoRete
 
Berkeley VC Course Completion Certificate 2016
Berkeley VC Course Completion Certificate 2016Berkeley VC Course Completion Certificate 2016
Berkeley VC Course Completion Certificate 2016Mitchell Weinstock
 
How to Send Fax From BlackBerry Smartphones
How to Send Fax From BlackBerry SmartphonesHow to Send Fax From BlackBerry Smartphones
How to Send Fax From BlackBerry Smartphones
alooftwaddle1646
 
The Adbrand Battles Presentation Final Draft
The Adbrand Battles Presentation Final DraftThe Adbrand Battles Presentation Final Draft
The Adbrand Battles Presentation Final Draft
Eric Floresca
 
Sitios web
Sitios webSitios web
Sitios web
Camilo Dorado
 
Catalogo 2013 aplic_por_correias_poly_v
Catalogo 2013 aplic_por_correias_poly_vCatalogo 2013 aplic_por_correias_poly_v
Catalogo 2013 aplic_por_correias_poly_v
carlosmendelski
 
Onlinechat
OnlinechatOnlinechat
Onlinechat
shavetaverma
 
Tabela de-parafusos1
Tabela de-parafusos1Tabela de-parafusos1
Tabela de-parafusos1
lauromatozinho
 
The Seven Kinds of Security
The Seven Kinds of SecurityThe Seven Kinds of Security
The Seven Kinds of Security
Veracode
 
PLANTAS VASCULARES SIN SEMILLA
PLANTAS VASCULARES SIN SEMILLAPLANTAS VASCULARES SIN SEMILLA
PLANTAS VASCULARES SIN SEMILLA
william tito nina
 
Gabaritos para Imprimir - Tamanho Real de Parafusos, Porcas e Arruelas
Gabaritos para Imprimir - Tamanho Real de Parafusos, Porcas e ArruelasGabaritos para Imprimir - Tamanho Real de Parafusos, Porcas e Arruelas
Gabaritos para Imprimir - Tamanho Real de Parafusos, Porcas e Arruelas
RH Indufix Fixadores
 
Back-to-School market trends + recommendations
Back-to-School market trends + recommendations Back-to-School market trends + recommendations
Back-to-School market trends + recommendations
VaynerMedia
 

Viewers also liked (15)

Agenda
AgendaAgenda
Agenda
 
Relazioni brixen 2016
Relazioni brixen 2016Relazioni brixen 2016
Relazioni brixen 2016
 
Berkeley VC Course Completion Certificate 2016
Berkeley VC Course Completion Certificate 2016Berkeley VC Course Completion Certificate 2016
Berkeley VC Course Completion Certificate 2016
 
How to Send Fax From BlackBerry Smartphones
How to Send Fax From BlackBerry SmartphonesHow to Send Fax From BlackBerry Smartphones
How to Send Fax From BlackBerry Smartphones
 
The Adbrand Battles Presentation Final Draft
The Adbrand Battles Presentation Final DraftThe Adbrand Battles Presentation Final Draft
The Adbrand Battles Presentation Final Draft
 
CV Iain Lewis
CV Iain LewisCV Iain Lewis
CV Iain Lewis
 
eng
engeng
eng
 
Sitios web
Sitios webSitios web
Sitios web
 
Catalogo 2013 aplic_por_correias_poly_v
Catalogo 2013 aplic_por_correias_poly_vCatalogo 2013 aplic_por_correias_poly_v
Catalogo 2013 aplic_por_correias_poly_v
 
Onlinechat
OnlinechatOnlinechat
Onlinechat
 
Tabela de-parafusos1
Tabela de-parafusos1Tabela de-parafusos1
Tabela de-parafusos1
 
The Seven Kinds of Security
The Seven Kinds of SecurityThe Seven Kinds of Security
The Seven Kinds of Security
 
PLANTAS VASCULARES SIN SEMILLA
PLANTAS VASCULARES SIN SEMILLAPLANTAS VASCULARES SIN SEMILLA
PLANTAS VASCULARES SIN SEMILLA
 
Gabaritos para Imprimir - Tamanho Real de Parafusos, Porcas e Arruelas
Gabaritos para Imprimir - Tamanho Real de Parafusos, Porcas e ArruelasGabaritos para Imprimir - Tamanho Real de Parafusos, Porcas e Arruelas
Gabaritos para Imprimir - Tamanho Real de Parafusos, Porcas e Arruelas
 
Back-to-School market trends + recommendations
Back-to-School market trends + recommendations Back-to-School market trends + recommendations
Back-to-School market trends + recommendations
 

Similar to Compilation

Intro to relational model
Intro to relational modelIntro to relational model
Intro to relational model
ATS SBGI MIRAJ
 
CS 542 -- Query Optimization
CS 542 -- Query OptimizationCS 542 -- Query Optimization
CS 542 -- Query OptimizationJ Singh
 
dld 01-introduction
dld 01-introductiondld 01-introduction
dld 01-introduction
United International University
 
Query Optimization - Brandon Latronica
Query Optimization - Brandon LatronicaQuery Optimization - Brandon Latronica
Query Optimization - Brandon Latronica
"FENG "GEORGE"" YU
 
S1 - Process product optimization using design experiments and response surfa...
S1 - Process product optimization using design experiments and response surfa...S1 - Process product optimization using design experiments and response surfa...
S1 - Process product optimization using design experiments and response surfa...
CAChemE
 
DBMS ArchitectureQuery ExecutorBuffer ManagerStora
DBMS ArchitectureQuery ExecutorBuffer ManagerStoraDBMS ArchitectureQuery ExecutorBuffer ManagerStora
DBMS ArchitectureQuery ExecutorBuffer ManagerStora
LinaCovington707
 
relational algebra and it's implementation
relational algebra and it's implementationrelational algebra and it's implementation
relational algebra and it's implementation
dbmscse61
 
Unit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptxUnit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptx
ssuser01e301
 
learned optimizer.pptx
learned optimizer.pptxlearned optimizer.pptx
learned optimizer.pptx
Qingsong Guo
 
lecture14.ppt
lecture14.pptlecture14.ppt
lecture14.ppt
SivaSankaran81
 
Adbms 40 heuristics in query optimization
Adbms 40 heuristics in query optimizationAdbms 40 heuristics in query optimization
Adbms 40 heuristics in query optimization
Vaibhav Khanna
 
Lecture 06 relational algebra and calculus
Lecture 06 relational algebra and calculusLecture 06 relational algebra and calculus
Lecture 06 relational algebra and calculusemailharmeet
 
Relational Algebra1.ppt
Relational Algebra1.pptRelational Algebra1.ppt
Relational Algebra1.ppt
20DCE031RAJGANATRA
 
Relational Algebra
Relational AlgebraRelational Algebra
Relational Algebra
Amin Omi
 
MOD2-DBMS.pdf
MOD2-DBMS.pdfMOD2-DBMS.pdf
MOD2-DBMS.pdf
RohitKumarSahoo5
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
Mohammed El Rafie Tarabay
 
Module 2 - part i
Module   2 - part iModule   2 - part i
Module 2 - part i
ParthNavale
 

Similar to Compilation (20)

Intro to relational model
Intro to relational modelIntro to relational model
Intro to relational model
 
CS 542 -- Query Optimization
CS 542 -- Query OptimizationCS 542 -- Query Optimization
CS 542 -- Query Optimization
 
dld 01-introduction
dld 01-introductiondld 01-introduction
dld 01-introduction
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
Query Optimization - Brandon Latronica
Query Optimization - Brandon LatronicaQuery Optimization - Brandon Latronica
Query Optimization - Brandon Latronica
 
S1 - Process product optimization using design experiments and response surfa...
S1 - Process product optimization using design experiments and response surfa...S1 - Process product optimization using design experiments and response surfa...
S1 - Process product optimization using design experiments and response surfa...
 
DBMS ArchitectureQuery ExecutorBuffer ManagerStora
DBMS ArchitectureQuery ExecutorBuffer ManagerStoraDBMS ArchitectureQuery ExecutorBuffer ManagerStora
DBMS ArchitectureQuery ExecutorBuffer ManagerStora
 
Algebra
AlgebraAlgebra
Algebra
 
relational algebra and it's implementation
relational algebra and it's implementationrelational algebra and it's implementation
relational algebra and it's implementation
 
Unit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptxUnit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptx
 
learned optimizer.pptx
learned optimizer.pptxlearned optimizer.pptx
learned optimizer.pptx
 
lecture14.ppt
lecture14.pptlecture14.ppt
lecture14.ppt
 
Adbms 40 heuristics in query optimization
Adbms 40 heuristics in query optimizationAdbms 40 heuristics in query optimization
Adbms 40 heuristics in query optimization
 
Lecture 06 relational algebra and calculus
Lecture 06 relational algebra and calculusLecture 06 relational algebra and calculus
Lecture 06 relational algebra and calculus
 
Relational Algebra1.ppt
Relational Algebra1.pptRelational Algebra1.ppt
Relational Algebra1.ppt
 
Relational Algebra
Relational AlgebraRelational Algebra
Relational Algebra
 
MOD2-DBMS.pdf
MOD2-DBMS.pdfMOD2-DBMS.pdf
MOD2-DBMS.pdf
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
 
Query compiler
Query compilerQuery compiler
Query compiler
 
Module 2 - part i
Module   2 - part iModule   2 - part i
Module 2 - part i
 

Compilation

  • 1. DBMS 2001 Notes 6: Query Compilation 1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based on Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom)
  • 2. DBMS 2001 Notes 6: Query Compilation 2 Overview • We have studied recently: – Algorithms for selections and joins, and their costs (in terms of disk I/O) • Next: A closer look at query compilation – Parsing – Algebraic optimization of logical query plans – Estimating sizes of intermediate results • Focus still on selections and joins • Remember the overall process of query execution:
  • 3. DBMS 2001 Notes 6: Query Compilation 3 {P1,P2,…..} {P1,C1>...} parse convert apply laws estimate result sizes consider physical plans estimate costs pick best execute Pi answer SQL query parse tree logical query plan “improved” l.q.p l.q.p. +sizes statistics
  • 4. DBMS 2001 Notes 6: Query Compilation 4 Step 1: Parsing • Check syntactic correctness of the query, and transform it into a parse tree – Based on the formal grammar for SQL – Inner nodes nonterminal symbols (syntactic categories for things like <Query>, <RelName>, or <Condition>) – Leaves terminal symbols: names of relations or attributes, keywords (SELECT, FROM, …), operators (+, AND, OR, LIKE, …), operands (10, '%1960', …) – Also check semantic correctness: Relations and attributes exist, operands compatible with operators, …
  • 5. DBMS 2001 Notes 6: Query Compilation 5 Example: SQL query Consider querying a movie database with relations StarsIn(title, year, starName) MovieStar(name, address, gender, birthdate) SELECT title FROM StarsIn, MovieStar WHERE starName = name AND birthdate LIKE ‘%1960’ ; (Find the titles for movies with stars born in 1960)
  • 6. DBMS 2001 Notes 6: Query Compilation 6 Example: Parse Tree <Query> <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> , <FromList> AND title StarsIn <RelName> <Condition> <Condition> <Attribute> = <Attribute> <Attribute> LIKE <Pattern> starName name birthdate ‘%1960’ MovieStar
  • 7. DBMS 2001 Notes 6: Query Compilation 7 Step 2: Parse Tree −> Logical Query Plan• Basic strategy: SELECT A, B, C FROM R1, R2 WHERE Cond ; becomes πA,B,C[σCond(R1 x R2)]
  • 8. DBMS 2001 Notes 6: Query Compilation 8 Example: Logical Query Plan Πtitle σstarName=name and birthdate LIKE ‘%1960’ StarsIn MovieStar ×
  • 9. DBMS 2001 Notes 6: Query Compilation 9 Step 3: Improving the L.Q.P • Transform the logical query plan into an equivalent form expected to lead to better performance – Based on laws of relational algebra – Normal to produce a single optimized form, which acts as input for the generation of physical query plans
  • 10. DBMS 2001 Notes 6: Query Compilation 10 Example: Improved Logical Query Plan Πtitle starName=name StarsIn σbirthdate LIKE ‘%1960’ MovieStar
  • 11. DBMS 2001 Notes 6: Query Compilation 11 Step 4: Estimate Result Sizes • Cost estimates of database algorithms depend on sizes of input relations −> Need to estimate sizes of intermediate results – Estimates based on statistics about relations, gathered periodically or incrementally
  • 12. DBMS 2001 Notes 6: Query Compilation 12 Example: Estimate Result Sizes Need expected size StarsIn MovieStar σ
  • 13. DBMS 2001 Notes 6: Query Compilation 13 Steps 5, 6, ... • Generate and compare query plans – generate alternate query plans P1, …, Pk by selecting algorithms and execution orders for relational operations – Estimate the cost of each plan – Choose the plan Pi estimated to be "best" • Execute plan Pi and return its result
  • 14. DBMS 2001 Notes 6: Query Compilation 14 Example: One Physical Plan Parameters: join order, memory size, project attributes,... Hash join SEQ scan index scan Parameters: Select Condition,... StarsIn MovieStar
  • 15. DBMS 2001 Notes 6: Query Compilation 15 Next: Closer look at ... • Transformation rules • Estimating result sizes
  • 16. DBMS 2001 Notes 6: Query Compilation 16 Relational algebra optimization • Transformation rules (preserve equivalence) • What are good transformations?
  • 17. DBMS 2001 Notes 6: Query Compilation 17 Rules: Natural joins R S = S R (commutative) (R S) T = R (S T) (associative) • Carry attribute names in results, so order is not important −> Can evaluate in any order
  • 18. DBMS 2001 Notes 6: Query Compilation 18 (R x S) x T = R x (S x T) R x S = S x R R U (S U T) = (R U S) U T R U S = S U R Rules: Cross products & union similarly (both associative & commutative):
  • 19. DBMS 2001 Notes 6: Query Compilation 19 Rules: Selects 1. σp1∧p2(R) = 2. σp1vp2(R) = σp1 [ σp2 (R)] [ σp1 (R)] U [ σp2 (R)] 1. Especially useful (applied left-to-right): Allows compound select conditions to be split and moved to suitable positions (See next) NB: ∧ = AND; v = OR
  • 20. DBMS 2001 Notes 6: Query Compilation 20 Rules: Products to Joins Definition of Natural Join: R S = πL[σC(R x S)] ; Condition C equates attributes common to R and S, and πL projects one copy of them out Applied right-to-left - definition of general join applied similarly
  • 21. DBMS 2001 Notes 6: Query Compilation 21 Let p = predicate with only R attribs q = predicate with only S attribs m = predicate with attribs common to R,S σp (R S) = σq (R S) = Rules: σ + combined [σp (R)] S R [σq (S)] More rules can be derived...
  • 22. DBMS 2001 Notes 6: Query Compilation 22 σp∧q (R S) = [σp (R)] [σq (S)] σp∧q∧m (R S) = σm [(σp R) (σq S)] σpvq (R S) = [(σp R) S] U [R (σq S)] Derived rules for σ +
  • 23. DBMS 2001 Notes 6: Query Compilation 23 Derivation for first one; Others for homework: σp∧q (R S) = σp [σq (R S) ] = σp [(R σq (S) ] = [σp (R)] [σq (S)]
  • 24. DBMS 2001 Notes 6: Query Compilation 24 Rules for σ, π combined with X similar... e.g., σp (R X S) = ?
  • 25. DBMS 2001 Notes 6: Query Compilation 25 σp1∧p2 (R) → σp1 [σp2 (R)] σp (R S) → [σp (R)] S R S → S R Some “good” transformations: • No transformation is always good • Usually good: early selections
  • 26. DBMS 2001 Notes 6: Query Compilation 26 Outline - Query Processing • Relational algebra level – transformations – good transformations • Detailed query plan level – estimate costs – generate and compare plans next
  • 27. DBMS 2001 Notes 6: Query Compilation 27 • Estimating cost of query plan (1) Estimating size of intermediate results −> (2) Estimating # of I/Os (considered last week)
  • 28. DBMS 2001 Notes 6: Query Compilation 28 Estimating result size • Maintain statistics for relation R – T(R) : # tuples in R – L(R) : Length of rows, # of bytes in each R tuple – B(R): # of blocks to hold all R tuples – V(R, A) : # distinct values in R for attribute A
  • 29. DBMS 2001 Notes 6: Query Compilation 29 Example R A: 20 byte string B: 4 byte integer C: 8 byte date D: 5 byte string A B C D cat 1 10 a cat 1 20 b dog 1 30 a dog 1 40 c bat 1 50 d T(R) = 5 L(R) = 37 V(R,A) = 3 V(R,C) = 5 V(R,B) = 1 V(R,D) = 4
  • 30. DBMS 2001 Notes 6: Query Compilation 30 Size estimates for product W = R1xR2 T(W) = L(W) = T(R1) × T(R2) L(R1) + L(R2)
  • 31. DBMS 2001 Notes 6: Query Compilation 31 L(W) = L(R) T(W) = ? Estimates for selection W = σA=a (R) = AVG number of tuples that satisfy an equality condition on R.A = …
  • 32. DBMS 2001 Notes 6: Query Compilation 32 Example R V(R,A)=3 V(R,B)=1 V(R,C)=5 V(R,D)=4 AVG size of, say, σA=val(R)? T(σA=cat(R))=2, T(σA=dog(R))=2, T(σA=bat(R))=1 => (2+2+1)/3 = 5/3 A B C D cat 1 10 a cat 1 20 b dog 1 30 a dog 1 40 c bat 1 50 d
  • 33. DBMS 2001 Notes 6: Query Compilation 33 Size of W = σZ=a(R) in general: Assume: Only existing Z values a1, a2, … are used in a select expression Z= ai, each with equal probalility 1/V(R,Z) => the expected size of σZ=val(R) is E(T(W)) = Σi 1/V(R,Z) * T(σZ=a i (R)) = T(R)/V(R,Z)
  • 34. DBMS 2001 Notes 6: Query Compilation 34 What about selection W=σz ≥ val (R) ? T(W) = ? • Estimate 1: T(W) = T(R)/2 – Rationale: On avg, 1/2 of tuples satisfy the condition • Estimate 2: T(W) = T(R)/3 – Rationale: Acknowledges the tendency of selecting "interesting" (e.g., rare tuples) more frequently <, ≤ and < similary
  • 35. DBMS 2001 Notes 6: Query Compilation 35 Size estimates for W = R1 R2 Let X = attributes of R1 Y = attributes of R2 X ∩ Y = ∅ Same as R1 x R2 Case 1
  • 36. DBMS 2001 Notes 6: Query Compilation 36 W = R1 R2 X ∩ Y = A R1 A B C R2 A D Case 2 Assumption: V(R1,A) ≤ V(R2,A) ⇒ πA(R1) ⊆ πA(R2) V(R2,A) ≤ V(R1,A) ⇒ πA(R2) ⊆ πA(R1) “Containment of value sets” [Sec. 7.4.4]
  • 37. DBMS 2001 Notes 6: Query Compilation 37 Why should “containment of value sets” hold? Consider joining relations Faculties(FName, …) and Depts(DName, FName, …) where Depts.FName is a foreign key, and faculties can have 0,…,n departments. Now V(Depts, FName) ≤ V(Faculties, FName), and referential integrity requires that πFName(Depts) ⊆ πFName(Faculties)
  • 38. DBMS 2001 Notes 6: Query Compilation 38 R1 A B C R2 A D a Estimating T(W) when V(R1,A) ≤ V(R2,A) Take 1 tuple Match Each estimated to match with T(R2)/V(R2,A) tuples ... so T(W) = T(R1)×T(R2)/V(R2, A)
  • 39. DBMS 2001 Notes 6: Query Compilation 39 • V(R1,A) ≤ V(R2,A): T(W) = T(R1)T(R2)/V(R2,A) • V(R2,A) ≤ V(R1,A): T(W) = T(R2)T(R1)/V(R1,A) [A is the join attribute] => in general: T(W) = T(R1)T(R2)/max{V(R1,A), V(R2,A) }
  • 40. DBMS 2001 Notes 6: Query Compilation 40 With similar ideas, can estimate sizes of: • W = ΠAB (R) ….. [Sec. 7.4.2] • W = σA=a∧B=b (R) = σA=a (σB=b (R)); Ass. A and B independent => T(W) = T(R)/(V(R, A) x V(R, B)) • W=R(A,X,Y) S(X,Y,B); [Sec. 7.4.5] Ass. X and Y independent => … T(W) = T(R)T(S)/(max{V(R,X), V(S,X)} x max{V(R,Y), V(S,Y)}) • Union, intersection, diff, …. [Sec. 7.4.7]
  • 41. DBMS 2001 Notes 6: Query Compilation 41 Note: for complex expressions, need intermediate estimates. E.g. W = [σA=a (R1) ] R2 Treat as relation U T(U) = T(R1)/V(R1,A) L(U) = L(R1) Also need an estimate for V(U, Ai) !
  • 42. DBMS 2001 Notes 6: Query Compilation 42 To estimate V (U, Ai) E.g., U = σA=a (R) Say R has attributes A,B,C,D V(U, A) = ? V(U, B) = ? V(U, C) = ? V(U, D) = ?
  • 43. DBMS 2001 Notes 6: Query Compilation 43 Example R V(R,A)=3 V(R,B)=1 V(R,C)=T(R)=5 V(R,D)=3 U = σA=x (R) A B C D cat 1 10 10 cat 1 20 20 dog 1 30 10 dog 1 40 30 bat 1 50 10 V(U, D) = V(R,D)/T(R,A) ... V(R,D) V(U,A) =1 V(U,B) =1 V(U,C) = T(R) V(R,A)
  • 44. DBMS 2001 Notes 6: Query Compilation 44 V(U,Ai) for Joins U = R1(A,B) R2(A,C) V(U,A) = min { V(R1, A), V(R2, A) } V(U,B) = V(R1, B) V(U,C) = V(R2, C) [Assumption called “preservation of value sets”, sect. 7.4.4] Values of non-join attributes preserved
  • 45. DBMS 2001 Notes 6: Query Compilation 45 Example: Z = R1(A,B) R2(B,C) R3(C,D) • R1: T(R1) = 1000 V(R1,A)=50 V(R1,B)=100 • R2: T(R2) = 2000 V(R2,B)=200 V(R2,C)=300 • R3: T(R3) = 3000 V(R3,C)=90 V(R3,D)=500
  • 46. DBMS 2001 Notes 6: Query Compilation 46 T(U) = T(R1) ×T(R2)/max{V(R1,B), V(R2,B)} = 1000×2000/200 = 10,000; V(U,A) = V(R1,A) = 50 V(U,C) = V(R2,C) = 300 V(U,B) = min{V(R1,B), V(R2,B)}= 100 Intermediate Result: U(A,B,C)= R1(A,B) R2(B,C)
  • 47. DBMS 2001 Notes 6: Query Compilation 47 Z = U(A,B,C) R3(C,D) T(Z) = T(U)×T(R3)/ max{V(U,C), V(R3,C)} = 10,000×3000/300 = 100,000 V(Z,A) = V(U,A) = 50 V(Z,B) = V(U,B) = 100 V(Z,C) = min{V(U,C), V(R3,C)} = 90 V(Z,D) = V(R3,D) =500
  • 48. DBMS 2001 Notes 6: Query Compilation 48 Outline/Summary • Estimating cost of query plan – Estimating size of results done! – Estimating # of IOs last week • Generate and compare plans – skip this • Execute physical operations of query plan – Sketched last week