The document discusses the process of query compilation in a database management system. It involves 6 main steps: 1) Parsing the SQL query into a parse tree, 2) Converting the parse tree into a logical query plan, 3) Optimizing the logical query plan by applying transformation rules, 4) Estimating the sizes of results from operations in the logical query plan, 5) Generating multiple physical query plans from the logical plan, and 6) Estimating the costs of physical plans and selecting the most efficient plan to execute. The document focuses on relational algebra rules for optimization and techniques for estimating result sizes of operations like selections, joins, and projections.
The aim of this presentation is to revise the functional regression models with scalar response (Linear, Nonlinear and Semilinear) and the extension to the more general case where the response belongs to the exponential family (binomial, poisson, gamma, ...). This extension allows to develop new functional classification methods based on this regression models. Some examples along with code implementation in R are provided during the talk. Lecturer: Manuel Febrero Bande, Univ. de Santiago de Compostela, Spain.
Multinomial Logistic Regression with Apache SparkDB Tsai
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
The aim of this presentation is to revise the functional regression models with scalar response (Linear, Nonlinear and Semilinear) and the extension to the more general case where the response belongs to the exponential family (binomial, poisson, gamma, ...). This extension allows to develop new functional classification methods based on this regression models. Some examples along with code implementation in R are provided during the talk. Lecturer: Manuel Febrero Bande, Univ. de Santiago de Compostela, Spain.
Multinomial Logistic Regression with Apache SparkDB Tsai
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
2014-06-20 Multinomial Logistic Regression with Apache SparkDB Tsai
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
Relazioni presentate al convegno GIORNATE DI NEUROPSICOLOGIA DELL'ETÁ EVOLUTIVA
che si è svolto dal 20 al 23 Gennaio 2016, a Bressanone (BZ), presso la Libera Università di Bolzano, Via Ratisbona, 16.
Online chat is a form of communication that utilizes computer programs that allow for two-way conversations between users in real time (events that occur in cyberspace at the same speed that they would occur in real li
Enterprises are constantly working to implement new, faster, better technology to run their businesses. In turn, cyberattackers are working equally as hard to find ways to breach that technology, and security professionals are churning out solutions to thwart attacks. This cycle of activity leads to today’s layered, complex enterprise security ecosystems. These ecosystems are like any ecosystem in the natural world, with interdependencies, limited resources, and a need for balance to make them run smoothly. If one layer falters, the whole ecosystem can become unstable.
With the recent introduction of applications as a business driver, the security ecosystem needs to adapt. The application layer is now a critical player, and requires a reworking of the ecosystem to restore balance and security. However, this reworking has yet to happen in many cases, leading to the surge of breaches we’ve seen lately. End-point and network security tend to garner the lion’s share of IT attention – leading to an unbalanced security ecosystem, an exposed application layer, and serious breaches.
It is important to understand all the layers of security and how they work together to secure your enterprise. Start by getting the facts and stats with our new gbook, The Seven Kinds of Security.
2014-06-20 Multinomial Logistic Regression with Apache SparkDB Tsai
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
Relazioni presentate al convegno GIORNATE DI NEUROPSICOLOGIA DELL'ETÁ EVOLUTIVA
che si è svolto dal 20 al 23 Gennaio 2016, a Bressanone (BZ), presso la Libera Università di Bolzano, Via Ratisbona, 16.
Online chat is a form of communication that utilizes computer programs that allow for two-way conversations between users in real time (events that occur in cyberspace at the same speed that they would occur in real li
Enterprises are constantly working to implement new, faster, better technology to run their businesses. In turn, cyberattackers are working equally as hard to find ways to breach that technology, and security professionals are churning out solutions to thwart attacks. This cycle of activity leads to today’s layered, complex enterprise security ecosystems. These ecosystems are like any ecosystem in the natural world, with interdependencies, limited resources, and a need for balance to make them run smoothly. If one layer falters, the whole ecosystem can become unstable.
With the recent introduction of applications as a business driver, the security ecosystem needs to adapt. The application layer is now a critical player, and requires a reworking of the ecosystem to restore balance and security. However, this reworking has yet to happen in many cases, leading to the surge of breaches we’ve seen lately. End-point and network security tend to garner the lion’s share of IT attention – leading to an unbalanced security ecosystem, an exposed application layer, and serious breaches.
It is important to understand all the layers of security and how they work together to secure your enterprise. Start by getting the facts and stats with our new gbook, The Seven Kinds of Security.
Gabaritos para Imprimir - Tamanho Real de Parafusos, Porcas e ArruelasRH Indufix Fixadores
Quanta vezes você precisou comprar um parafuso, porca ou arruela e não sabia ao certo qual era a medida, tipo de cabeça, passo de rosca ou até mesmo a aplicação correta?
Para resolver essas questões, preparamos um guia onde você encontrará imagens de fixadores em TAMANHO REAL que, ao serem impressas, podem lhe auxiliar a descobrir a medida do item!
Imprima a página com o gabarito, posicione sua amostra de parafuso, porca ou arruela sobre a imagem correspondente e identifique corretamente suas dimensões. Viu como é fácil?
PARA MELHOR QUALIDADE, BAIXE NO LINK http://www.catalogo.indufix.com.br/tamanho-real-parafuso-porca-arruela-gabarito-para-imprimir
Market research and proposed strategy for the Brother P-Touch to become an essential tool for kids and parents during the Back-to-school retail season. This work was completed during my time with Concept Farm as an account supervisor and strategist on the brand.
S1 - Process product optimization using design experiments and response surfa...CAChemE
An intensive practical course mainly for PhD-students on the use of designs of experiments (DOE) and response surface methodology (RSM) for optimization problems. The course covers relevant background, nomenclature and general theory of DOE and RSM modelling for factorial and optimisation designs in addition to practical exercises in Matlab. Due to time limitations, the course concentrates on linear and quadratic models on the k≤3 design dimension. This course is an ideal starting point for every experimental engineering wanting to work effectively, extract maximal information and predict the future behaviour of their system.
Mikko Mäkelä (DSc, Tech) is a postdoctoral fellow at the Swedish University of Agricultural Sciences in Umeå, Sweden and is currently visiting the Department of Chemical Engineering at the University of Alicante. He is working in close cooperation with Paul Geladi, Professor of Chemometrics, and using DOE and RSM for process optimization mainly for the valorization of industrial wastes in laboratory and pilot scales.”
DBMS Architecture
Query Executor
Buffer Manager
Storage Manager
Storage
Transaction Manager
Logging &
Recovery
Lock Manager
Buffers Lock Tables
Main Memory
User/Web Forms/Applications/DBA
query transaction
Query Optimizer
Query Rewriter
Query Parser
Files & Access Methods
Past lectures
Today’s
lecture
Many query plans to execute a SQL query
3
S UTR
SR
T
U
SR
T
U
• Even more plans: multiple algorithms to execute each operation
SR
T
U
Sort-merge
Sort-merge
hash join
Table-scan
index-scan
Table-scanindex-scan
• Compute the join of R(A,B) S(B,C) T(C,D) U(D,E)
Explain command in PostgreSQL
4
SELECT C.STATE, SUM(O.NETAMOUNT), SUM(O.TOTALAMOUNT)
FROM CUSTOMERS C
JOIN CUST_HIST CH ON C.CUSTOMERID = CH.CUSTOMERID
JOIN ORDERS O ON CH.ORDERID = O.ORDERID
GROUP BY C.STATE
Communication between operators:
iterator model
• Each physical operator implements three functions:
– Open: initializes the data structures.
– GetNext: returns the next tuple in the result.
– Close: ends the operation and frees the resources.
• It enables pipelining
• Other option: compute the result of the operator in full
and store it in disk or memory:
– inefficient.
5
Query optimization: picking the fastest plan
• Optimal approach plan
– enumerate each possible plan
– measure its performance by running it
– pick the fastest one
– What’s wrong?
• Rule-based optimization
– Use a set of pre-defined rules to generate a fast plan
• e.g., If there is an index over a table, use it for scan and join.
6
Review Definitions
• Statistics on table R:
– T(R): Number of tuples in R
– B(R): Number of blocks in R
• B(R) = T(R ) / block size
– V(R,A): Number of distinct values of attribute A in R
7
Plans to select tuples from R: σA=a(R)
• We have an unclustered index on R
• Plans:
– (Unclustered) indexed-based scan
– Table-scan (sequential access)
• Statistics on R
– B(R)=5000, T(R)=200,000
– V(R,A) = 2, one value appears in 95% of tuples.
• Unclustered indexed scan vs. table-scan ?
8
Presenter
Presentation Notes
Unclustered indexed scan vs. table-scan ?
table-scan is the winner!
Query optimization methods
• Rule-based optimizer fails
– It uses static rules
– The rules do not consider the distribution of the data.
• Cost-based optimization
– predict the cost of each plan
– search the plan space to find the fastest one
– do it efficiently
• Optimization itself should be fast!
9
Cost-based optimization
• Plan space
– which plans to consider?
– it is time consuming to explore all alternatives.
• Cost estimator
– how to estimate the cost of each plan without executing it?
– we would like to have accurate estimation
• Search algorithm
– how to search the plan space fast?
– we would like to avoid checking inefficient plans
10
Space of query plans: basic operators
• Selection
– algorithms: sequential, index-based
– Ordering
• Projection
– Ordering
• Join
– algorithms: nested loop, sort-merge, hash
– orderi ...
• Process for heuristics optimization
1. The parser of a high-level query generates an initial internal representation;
2. Apply heuristics rules to optimize the internal representation.
3. A query execution plan is generated to execute groups of operations based on the access paths available on the files involved in the query.
1. DBMS 2001 Notes 6: Query Compilation 1
Principles of Database
Management Systems
6: Query Compilation and
Optimization
Pekka Kilpeläinen
(partially based on Stanford CS245 slide
originals by Hector Garcia-Molina, Jeff Ullman
and Jennifer Widom)
2. DBMS 2001 Notes 6: Query Compilation 2
Overview
• We have studied recently:
– Algorithms for selections and joins, and their costs
(in terms of disk I/O)
• Next: A closer look at query compilation
– Parsing
– Algebraic optimization of logical query plans
– Estimating sizes of intermediate results
• Focus still on selections and joins
• Remember the overall process of query
execution:
3. DBMS 2001 Notes 6: Query Compilation 3
{P1,P2,…..}
{P1,C1>...}
parse
convert
apply laws
estimate result sizes
consider physical plans estimate costs
pick best
execute
Pi
answer
SQL query
parse tree
logical query plan
“improved” l.q.p
l.q.p. +sizes
statistics
4. DBMS 2001 Notes 6: Query Compilation 4
Step 1: Parsing
• Check syntactic correctness of the query, and
transform it into a parse tree
– Based on the formal grammar for SQL
– Inner nodes nonterminal symbols (syntactic
categories for things like <Query>, <RelName>, or
<Condition>)
– Leaves terminal symbols: names of relations or
attributes, keywords (SELECT, FROM, …),
operators (+, AND, OR, LIKE, …), operands (10,
'%1960', …)
– Also check semantic correctness: Relations and
attributes exist, operands compatible with
operators, …
5. DBMS 2001 Notes 6: Query Compilation 5
Example: SQL query
Consider querying a movie database with relations
StarsIn(title, year, starName)
MovieStar(name, address, gender, birthdate)
SELECT title
FROM StarsIn, MovieStar
WHERE starName = name AND birthdate LIKE ‘%1960’ ;
(Find the titles for movies with stars born in 1960)
6. DBMS 2001 Notes 6: Query Compilation 6
Example: Parse Tree
<Query>
<SFW>
SELECT <SelList> FROM <FromList> WHERE <Condition>
<Attribute> <RelName> , <FromList> AND
title StarsIn <RelName>
<Condition> <Condition>
<Attribute> = <Attribute> <Attribute> LIKE <Pattern>
starName name birthdate ‘%1960’
MovieStar
7. DBMS 2001 Notes 6: Query Compilation 7
Step 2:
Parse Tree −> Logical Query
Plan• Basic strategy:
SELECT A, B, C
FROM R1, R2
WHERE Cond ;
becomes
πA,B,C[σCond(R1 x R2)]
8. DBMS 2001 Notes 6: Query Compilation 8
Example: Logical Query Plan
Πtitle
σstarName=name and birthdate LIKE ‘%1960’
StarsIn MovieStar
×
9. DBMS 2001 Notes 6: Query Compilation 9
Step 3: Improving the L.Q.P
• Transform the logical query plan into an
equivalent form expected to lead to
better performance
– Based on laws of relational algebra
– Normal to produce a single optimized form,
which acts as input for the generation of
physical query plans
10. DBMS 2001 Notes 6: Query Compilation 10
Example: Improved Logical Query Plan
Πtitle
starName=name
StarsIn
σbirthdate LIKE ‘%1960’
MovieStar
11. DBMS 2001 Notes 6: Query Compilation 11
Step 4: Estimate Result Sizes
• Cost estimates of database algorithms
depend on sizes of input relations
−> Need to estimate sizes of
intermediate results
– Estimates based on statistics about
relations, gathered periodically or
incrementally
12. DBMS 2001 Notes 6: Query Compilation 12
Example: Estimate Result Sizes
Need expected size
StarsIn
MovieStar
σ
13. DBMS 2001 Notes 6: Query Compilation 13
Steps 5, 6, ...
• Generate and compare query plans
– generate alternate query plans P1, …, Pk by
selecting algorithms and execution orders
for relational operations
– Estimate the cost of each plan
– Choose the plan Pi estimated to be "best"
• Execute plan Pi and return its result
14. DBMS 2001 Notes 6: Query Compilation 14
Example: One Physical Plan
Parameters: join order,
memory size, project attributes,...
Hash join
SEQ scan index scan Parameters:
Select Condition,...
StarsIn MovieStar
15. DBMS 2001 Notes 6: Query Compilation 15
Next: Closer look at ...
• Transformation rules
• Estimating result sizes
16. DBMS 2001 Notes 6: Query Compilation 16
Relational algebra optimization
• Transformation rules
(preserve equivalence)
• What are good transformations?
17. DBMS 2001 Notes 6: Query Compilation 17
Rules: Natural joins
R S = S R (commutative)
(R S) T = R (S T) (associative)
• Carry attribute names in results, so order
is not important
−> Can evaluate in any order
18. DBMS 2001 Notes 6: Query Compilation 18
(R x S) x T = R x (S x T)
R x S = S x R
R U (S U T) = (R U S) U T
R U S = S U R
Rules:
Cross products & union similarly
(both associative & commutative):
19. DBMS 2001 Notes 6: Query Compilation 19
Rules: Selects
1. σp1∧p2(R) =
2. σp1vp2(R) =
σp1 [ σp2 (R)]
[ σp1 (R)] U [ σp2 (R)]
1. Especially useful (applied left-to-right):
Allows compound select conditions to be
split and moved to suitable positions
(See next)
NB: ∧ = AND; v = OR
20. DBMS 2001 Notes 6: Query Compilation 20
Rules: Products to Joins
Definition of Natural Join:
R S = πL[σC(R x S)] ;
Condition C equates attributes common to R
and S, and πL projects one copy of them out
Applied right-to-left
- definition of general join applied similarly
21. DBMS 2001 Notes 6: Query Compilation 21
Let p = predicate with only R attribs
q = predicate with only S attribs
m = predicate with attribs common to R,S
σp (R S) =
σq (R S) =
Rules: σ + combined
[σp (R)] S
R [σq (S)]
More rules can be derived...
23. DBMS 2001 Notes 6: Query Compilation 23
Derivation for first one;
Others for homework:
σp∧q (R S) =
σp [σq (R S) ] =
σp [(R σq (S) ] =
[σp (R)] [σq (S)]
24. DBMS 2001 Notes 6: Query Compilation 24
Rules for σ, π combined with X
similar...
e.g., σp (R X S) = ?
25. DBMS 2001 Notes 6: Query Compilation 25
σp1∧p2 (R) → σp1 [σp2 (R)]
σp (R S) → [σp (R)] S
R S → S R
Some “good” transformations:
• No transformation is always good
• Usually good: early selections
26. DBMS 2001 Notes 6: Query Compilation 26
Outline - Query Processing
• Relational algebra level
– transformations
– good transformations
• Detailed query plan level
– estimate costs
– generate and compare plans
next
27. DBMS 2001 Notes 6: Query Compilation 27
• Estimating cost of query plan
(1) Estimating size of intermediate results
−>
(2) Estimating # of I/Os
(considered last week)
28. DBMS 2001 Notes 6: Query Compilation 28
Estimating result size
• Maintain statistics for relation R
– T(R) : # tuples in R
– L(R) : Length of rows,
# of bytes in each R tuple
– B(R): # of blocks to hold all R tuples
– V(R, A) :
# distinct values in R for attribute A
29. DBMS 2001 Notes 6: Query Compilation 29
Example
R A: 20 byte string
B: 4 byte integer
C: 8 byte date
D: 5 byte string
A B C D
cat 1 10 a
cat 1 20 b
dog 1 30 a
dog 1 40 c
bat 1 50 d
T(R) = 5 L(R) = 37
V(R,A) = 3 V(R,C) = 5
V(R,B) = 1 V(R,D) = 4
31. DBMS 2001 Notes 6: Query Compilation 31
L(W) = L(R)
T(W) = ?
Estimates for selection W = σA=a (R)
= AVG number of tuples that satisfy
an equality condition on R.A
= …
32. DBMS 2001 Notes 6: Query Compilation 32
Example
R V(R,A)=3
V(R,B)=1
V(R,C)=5
V(R,D)=4
AVG size of, say, σA=val(R)?
T(σA=cat(R))=2, T(σA=dog(R))=2, T(σA=bat(R))=1
=> (2+2+1)/3 = 5/3
A B C D
cat 1 10 a
cat 1 20 b
dog 1 30 a
dog 1 40 c
bat 1 50 d
33. DBMS 2001 Notes 6: Query Compilation 33
Size of W = σZ=a(R) in general:
Assume: Only existing Z values a1, a2, …
are used in a select expression Z= ai,
each with equal probalility 1/V(R,Z)
=> the expected size of σZ=val(R) is
E(T(W)) = Σi 1/V(R,Z) * T(σZ=a
i (R))
= T(R)/V(R,Z)
34. DBMS 2001 Notes 6: Query Compilation 34
What about selection W=σz ≥ val (R) ?
T(W) = ?
• Estimate 1: T(W) = T(R)/2
– Rationale: On avg, 1/2 of tuples satisfy the condition
• Estimate 2: T(W) = T(R)/3
– Rationale: Acknowledges the tendency of
selecting "interesting" (e.g., rare tuples) more
frequently
<, ≤ and < similary
35. DBMS 2001 Notes 6: Query Compilation 35
Size estimates for W = R1 R2
Let X = attributes of R1
Y = attributes of R2
X ∩ Y = ∅
Same as R1 x R2
Case 1
36. DBMS 2001 Notes 6: Query Compilation 36
W = R1 R2 X ∩ Y = A
R1 A B C R2 A D
Case 2
Assumption:
V(R1,A) ≤ V(R2,A) ⇒ πA(R1) ⊆ πA(R2)
V(R2,A) ≤ V(R1,A) ⇒ πA(R2) ⊆ πA(R1)
“Containment of value sets” [Sec. 7.4.4]
37. DBMS 2001 Notes 6: Query Compilation 37
Why should “containment of value sets” hold?
Consider joining relations
Faculties(FName, …) and
Depts(DName, FName, …)
where Depts.FName is a foreign key, and
faculties can have 0,…,n departments.
Now V(Depts, FName) ≤ V(Faculties,
FName), and referential integrity requires that
πFName(Depts) ⊆ πFName(Faculties)
38. DBMS 2001 Notes 6: Query Compilation 38
R1 A B C R2 A D
a
Estimating T(W) when V(R1,A) ≤ V(R2,A)
Take
1 tuple
Match
Each estimated to match with T(R2)/V(R2,A)
tuples ...
so T(W) = T(R1)×T(R2)/V(R2, A)
40. DBMS 2001 Notes 6: Query Compilation 40
With similar ideas, can estimate sizes of:
• W = ΠAB (R) ….. [Sec. 7.4.2]
• W = σA=a∧B=b (R) = σA=a (σB=b (R));
Ass. A and B independent =>
T(W) = T(R)/(V(R, A) x V(R, B))
• W=R(A,X,Y) S(X,Y,B); [Sec. 7.4.5]
Ass. X and Y independent => …
T(W) = T(R)T(S)/(max{V(R,X), V(S,X)} x
max{V(R,Y), V(S,Y)})
• Union, intersection, diff, …. [Sec. 7.4.7]
41. DBMS 2001 Notes 6: Query Compilation 41
Note: for complex expressions, need
intermediate estimates.
E.g. W = [σA=a (R1) ] R2
Treat as relation U
T(U) = T(R1)/V(R1,A) L(U) = L(R1)
Also need an estimate for V(U, Ai) !
42. DBMS 2001 Notes 6: Query Compilation 42
To estimate V (U, Ai)
E.g., U = σA=a (R)
Say R has attributes A,B,C,D
V(U, A) = ?
V(U, B) = ?
V(U, C) = ?
V(U, D) = ?
43. DBMS 2001 Notes 6: Query Compilation 43
Example
R V(R,A)=3
V(R,B)=1
V(R,C)=T(R)=5
V(R,D)=3
U = σA=x (R)
A B C D
cat 1 10 10
cat 1 20 20
dog 1 30 10
dog 1 40 30
bat 1 50 10
V(U, D) = V(R,D)/T(R,A) ... V(R,D)
V(U,A) =1 V(U,B) =1 V(U,C) = T(R)
V(R,A)
44. DBMS 2001 Notes 6: Query Compilation 44
V(U,Ai) for Joins U = R1(A,B) R2(A,C)
V(U,A) = min { V(R1, A), V(R2, A) }
V(U,B) = V(R1, B)
V(U,C) = V(R2, C)
[Assumption called
“preservation of value sets”, sect. 7.4.4]
Values of non-join
attributes preserved
48. DBMS 2001 Notes 6: Query Compilation 48
Outline/Summary
• Estimating cost of query plan
– Estimating size of results done!
– Estimating # of IOs last week
• Generate and compare plans
– skip this
• Execute physical operations of query plan
– Sketched last week