2. Relational Query Languages
Query languages: Allow manipulation and retrieval of
data from a database.
Relational model supports simple, powerful QLs:
− Strong formal foundation based on logic.
− Allows for much optimization.
Query Languages != programming languages!
− QLs not expected to be “Turing complete”.
− QLs not intended to be used for complex calculations.
− QLs support easy, efficient access to large data sets.
3. Formal Relational Query Languages
Two mathematical Query Languages form the
basis for “real” languages (e.g. SQL), and for
implementation:
Relational Algebra: More operational, very
useful for representing execution plans.
Relational Calculus: Lets users describe what
they want, rather than how to compute it. (Non-
operational, declarative.)
Understanding Algebra is key to understanding SQL,
and query processing!
4. Algebra Preliminaries
A query is applied to relation instances, and the
result of a query is also a relation instance.
− Schemas of input relations for a query are fixed (but
query will run regardless of instance!)
− The schema for the result of a given query is also fixed!
Determined by definition of query language constructs.
5. Example Instances
“Sailors” and “Reserves”
S1 sid sname rating age
relations for our examples.
22 dustin 7 45.0
R1
31 lubber 8 55.5
sid bid day 58 rusty 10 35.0
22 101 10/10/96
58 103 11/12/96 S2 sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
6. Algebra Operations
Look what we want to get from the following
table:
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
7. sname rating
Projection yuppy 9
lubber 8
Deletes attributes that are not in guppy 5
projection list. rusty 10
Schema of result contains exactly
π sname,rating(S2)
the fields in the projection list, with
the same names that they had in the
(only) input relation.
Projection operator has to eliminate
duplicates! (Why??)
age
− Note: real systems typically 35.0
don’t do duplicate elimination 55.5
unless the user explicitly asks
for it. (Why not?) π age(S2)
8. S2
Selection rating age
sid sname
28 yuppy 9 35.0
Selects rows that satisfy 31 lubber 8 55.5
selection condition. 44 guppy 5 35.0
No duplicates in result! 58 rusty 10 35.0
(Why?)
Schema of result identical
σ rating >8(S 2) =
to schema of (only) input
relation.
sid sname rating age
28 yuppy 9 35.0
58 rusty 10 35.0
9. Composition of Operations
Result relation can be the input for another relational
algebra operation! (Operator composition.)
S2 sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0 sname rating
π sname,rating (σ rating >8(S 2)) = yuppy 9
rusty 10
10. What do we want to get from two
relations?
S1
R1 sid sname rating age
sid bid day
22 dustin 7 45.0
22 101 10/10/96
31 lubber 8 55.5
58 103 11/12/96
58 rusty 10 35.0
What about: Who reserved boat 101?
Or: Find the name of the sailor who reserved boat 101.
11. Cross-Product
Each row of S1 is paired with each row of R1.
Result schema has one field per field of S1 and R1,
with field names inherited.
sid1 sname rating age sid2 bid day
22 dustin 7 45.0 22 101 10/10/96
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 22 101 10/10/96
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96
Renaming operator (because naming conflict):
ρ(sid → sid1, S1) × ρ (sid → sid 2, R1)
12. Why does this cross product help
Query: Find the name of the sailor who reserved boat 101.
CP = ρ (sid → sid1, S1) × ρ (sid → sid 2, R1)
Result =π (σ (CP))
Sname sid1= sid 2 and bid =101
* Note my use of “temporary” relation CP.
13. Another example
Find the name of the sailor having the highest
rating.
AllR =π (rating − > ratingA, S 2)
ratingA
Result? =π (σ (S 2×AllR))
Sname rating < ratingA
What’s in “Result?” ?
Does it answer our query?
14. Union, Intersection, Set-Difference
sid sname rating age
All of these operations take 22 dustin 7 45.0
two input relations, which 31 lubber 8 55.5
must be union-compatible: 58 rusty 10 35.0
− Same number of fields. 44 guppy 5 35.0
− `Corresponding’ fields 28 yuppy 9 35.0
have the same type. S1∪ S2
What is the schema of result?
sid sname rating age
sid sname rating age 31 lubber 8 55.5
22 dustin 7 45.0 58 rusty 10 35.0
S1− S2 S1∩ S2
15. Back to our query
Find the name of the sailor having the highest
rating.
AllR =π (rating − > ratingA, S 2)
ratingA
Tmp =π (σ (S 2×AllR))
Sid ,Sname rating < ratingA
Result =π (π (S 2 )− Tmp)
Sname Sid ,Sname
* Why not project on Sid only for Tmp?
16. Relational Algebra (Summary)
Basic operations:
− Selection ( σ ) Selects a subset of rows from relation.
− Projection ( π ) Deletes unwanted columns from relation.
− Cross-product ( × ) Allows us to combine two relations.
− Set-difference ( - ) Tuples in reln. 1, but not in reln. 2.
− Union ( ∪ ) Tuples in reln. 1 and in reln. 2.
− Rename ( ρ ) Changes names of the attributes
Since each operation returns a relation, operations can be
composed! (Algebra is “closed”.)
Use of temporary relations recommended.
17. Natural Join
Natural-Join:
S1 R1=
sid sname rating age bid day
22 dustin 7 45.0 101 10/10/96
58 rusty 10 35.0 103 11/12/96
Result schema similar to cross-product, but only one copy of
each field which appears in both relations.
Natural Join = Cross Product + Selection on common fields +
drop duplicate fields.
18. Query revisited using natural join
Query: Find the name of the sailor who reserved boat 101.
Result =π (σ (S1 R1))
Sname bid =101
Or
Result =π (S1 σ (R1))
Sname bid =101
19. Consider yet another query
Find the sailor(s) who reserved all the red
boats.
R1 B
sid bid day bid color
22 101 10/10/96 101 Red
22 103 10/11/96 102 Green
58 102 11/12/96 103 Red
20. Start an attempt
who reserved what boat: sid bid
22 101
S1=π (R1) = 22 103
sid ,bid
58 102
All the red boats:
bid
S 2 =π (σ (B)) = 101
bid color = red 103
21. Division
Not supported as a primitive operator, but useful for
expressing queries like:
Find sailors who
have reserved all red boats.
Let S1 have 2 fields, x and y; S2 have only field y:
− S1/S2 = x | forall y in R2 (there exists x, y in R1)
− i.e., S1/S2 contains all x tuples (sailors) such that for every
y tuple (redboat) in S2, there is an xy tuple in S1 (i.e, x
reserved y).
In general, x and y can be any lists of fields; y is the list
of fields in S2, and x∪y is the list of fields of S1.
24. Find names of sailors who’ve reserved a red
boat
Information about boat color only available in
Boats; so need an extra join:
π sname ((σ Boats) Re serves Sailors)
color =' red '
A more efficient solution:
π sname (π ((π σ Boats) Re s) Sailors)
sid bid color =' red '
A query optimizer can find this given the first solution!
25. Find sailors who’ve reserved a red or a green
boat
Can identify all red or green boats, then find
sailors who’ve reserved one of these boats:
Tempboats =σ (Boats)
color ='red '∨color =' green'
Re sult =π sname(Tempboats Re serves Sailors)
Can also define Tempboats using union! (How?)
What happens if ∨ is replaced by ∧ in this query?
26. Find sailors who’ve reserved a red and a
green boat
Previous approach won’t work! Must identify
sailors who’ve reserved red boats, sailors who’ve
reserved green boats, then find the intersection
(note that sid is a key for Sailors):
Tempred =π ((σ (Boats)) Re serves)
sid color ='red '
Tempgreen =π ((σ (Boats)) Re serves)
sid color =' green'
Re sult =π sname((Tempred ∩Tempgreen) Sailors)
27. Find the names of sailors who’ve reserved all
boats
Uses division; schemas of the input relations
to / must be carefully chosen:
Tempsids =(π (Re serves))/ (π (Boats))
sid ,bid bid
Result =π sname(Tempsids Sailors)
To find sailors who’ve reserved all ‘Interlake’ boats:
..... /π (σ (Boats))
bid bname ='Interlake'
28. Summary
The relational model has rigorously defined
query languages that are simple and powerful.
Relational algebra is more operational; useful as
internal representation for query evaluation
plans.
Several ways of expressing a given query; a
query optimizer should choose the most
efficient version.