2. Relational Query Languages
Query languages: Allow manipulation and retrieval of
data from a database.
Relational model supports simple, powerful QLs:
Strong formal foundation based on logic.
Allows for much optimization.
Query Languages != programming languages!
QLs not intended to be used for complex calculations.
QLs support easy, efficient access to large data sets.
3. Formal Relational Query Languages
Two mathematical Query Languages form the basis
for “real” languages (e.g. SQL), and for
implementation:
Relational Algebra: More operational, very useful
for representing execution plans.
Relational Calculus: Lets users describe what they
want, rather than how to compute it. (Non-
procedural, declarative.)
We will concentrate on relational algebra in this
course
* Understanding Algebra & Calculus is key to
understanding SQL, query processing!
4. Preliminaries
A query is applied to relation instances, and the
result of a query is also a relation instance.
Schemas of input relations for a query are fixed (but
query will run over any legal instance)
The schema for the result of a given query is also fixed.
It is determined by the definitions of the query language
constructs.
5. Basics of Querying
A query is applied to relation instances, and the result of a query is
also a relation instance.
eid ename Salary age
28 Eric 90K 35
58 Kyle 100K 33
ename Salary
Eric 90K
Kyle 100K
6. Relational Algebra: 5 Basic Operations
Selection ( s ) Selects a subset of rows from relation
(horizontal).
Projection ( p ) Retains only wanted columns from
relation (vertical).
Cross-product ( ) Allows us to combine two relations.
Set-difference ( — ) Tuples in r1, but not in r2.
Union ( ) Tuples in r1 or in r2.
Since each operation returns a relation, operations can be
composed! (Algebra is “closed”.)
7. sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
sid bid day
22 101 10/10/96
58 103 11/12/96
R1
S1
S2
bid bname color
101 Interlake blue
102 Interlake red
103 Clipper green
104 Marine red
Boats
Example Instances
8. Projection (p)
Examples:
Retains only attributes that are in the “projection list”.
Schema of result:
exactly the fields in the projection list, with the same names that they had
in the input relation.
Projection operator has to eliminate duplicates (How do
they arise? Why remove them?)
Note: real systems typically don’t do duplicate elimination unless the user
explicitly asks for it.
)
2
(S
age
p
psname rating
S
,
( )
2
9. Projection (p)
age
35.0
55.5
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
S2
sname rating
yuppy 9
lubber 8
guppy 5
rusty 10
)
2
(
,
S
rating
sname
p
page S
( )
2
10. Selection (s)
Selects rows that satisfy selection condition.
Result is a relation.
Schema of result is same as that of the input relation.
Do we need to do duplicate elimination?
srating
S
8
2
( ) p s
sname rating rating
S
,
( ( ))
8
2
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
sname rating
yuppy 9
rusty 10
11. Union, Intersection and Set-
Difference
Both of these operations take two input relations, which must be
union-compatible:
Two relations R and S are union-compatible if
they have the same number of columns and
corresponding columns have the same
domains.
17. Set Difference
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
S1
S2
sid sname rating age
22 dustin 7 45.0
S2 – S1
S S
1 2
sid sname rating age
28 yuppy 9 35.0
44 guppy 5 35.0
18. Cross-Product
S1 X R1
Each row of S1 is paired with each row of R1.
Result schema has one field per field of S1 and R1, with field names `inherited’ if possible.
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
sid bid day
22 101 10/10/96
58 103 11/12/96
S1 R1
27. (sid) sname rating age (sid) bid day
22 dustin 7 45.0 22 101 10/10/96
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 22 101 10/10/96
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96
Both S1 and R1 have a field called sid. Which may
cause a conflict when referring to columns
S1 X R1
28. Cross-Product
S1 R1: Each row of S1 paired with each row of R1.
Q: How many rows in the result?
Result schema has one field per field of S1 and R1,
with field names `inherited’ if possible.
May have a naming conflict: Both S1 and R1 have a field with
the same name.
In this case, can use the renaming operator:
( ( , ), )
C sid sid S R
1 1 5 2 1 1
29. Renaming Operator
( ( , ), )
C sid sid S R
1 1 5 2 1 1
(sid) sname rating age (sid) bid day
22 dustin 7 45.0 22 101 10/10/96
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 22 101 10/10/96
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96
sid1 sid2
Takes a relation schema and gives a new name to
the schema and the columns
1 2 3 4 5 6 7
S1 X R1
C
30. Compound Operator: Intersection
In addition to the 5 basic operators, there are several additional
“Compound Operators”
These add no computational power to the
language, but are useful shorthands.
Can be expressed solely with the basic ops.
Intersection takes two input relations, which must be union-
compatible.
Q: How to express it using basic operators?
R S = R (R S)
31. Compound Operator: Join
Joins are compound operators involving cross
product, selection, and (sometimes) projection.
Most common type of join is a “natural join” (often
just called “join”). R S conceptually is:
Compute R S
Select rows where attributes that appear in both relations
have equal values
Project all unique attributes and one copy of each of the
common ones.
32. Joins
Condition Join : c
S
R
)
( S
R
c
s
Result schema same as that of cross-product.
Fewer tuples than cross-product, might be able to compute
more efficiently
Sometimes called a theta-join.
Equi-Join: A special case of condition join where the condition
c contains only equalities.
33. Joins
(sid) sname rating age (sid) bid day
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 58 103 11/12/96
sid
R
sid
S
R
S
.
1
.
1
1
1
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
S1
sid bid day
22 101 10/10/96
58 103 11/12/96
R1
36. sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
S1 sid bid day
22 101 10/10/96
58 103 11/12/96
R1
)
1
1
(
.
1
.
1
R
S
sid
R
sid
S
s
(sid) sname rating age (sid) bid day
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 58 103 11/12/96
37. sid sname rating age bid day
22 dustin 7 45.0 101 10/10/96
58 rusty 10 35.0 103 11/12/96
S R
sid
1 1
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
S1
sid bid day
22 101 10/10/96
58 103 11/12/96
R1
Example: Natural join
38. sid sname rating age bid day
22 dustin 7 45.0 101 10/10/96
58 rusty 10 35.0 103 11/12/96
S R
sid
1 1
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
S1
sid bid day
22 101 10/10/96
58 103 11/12/96
R1
Example: Natural join
39. sid sname rating age bid day
22 dustin 7 45.0 101 10/10/96
58 rusty 10 35.0 103 11/12/96
S R
sid
1 1
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
S1
sid bid day
22 101 10/10/96
58 103 11/12/96
R1
Example: Natural join
40. sid sname rating age bid day
22 dustin 7 45.0 101 10/10/96
58 rusty 10 35.0 103 11/12/96
S R
sid
1 1
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
S1
sid bid day
22 101 10/10/96
58 103 11/12/96
R1
Example: Natural join
41. Outer Join
An extension of the join operation that avoids loss
of information.
Computes the join and then adds tuples form one
relation that does not match tuples in the other
relation to the result of the join.
Uses null values:
null signifies that the value is unknown or does not
exist
All comparisons involving null are (roughly speaking)
false by definition.
42. Outer Join – Example
Relation loan
customer_name loan_number
Jones
Smith
Hayes
L-170
L-230
L-155
3000
4000
1700
loan_number amount
L-170
L-230
L-260
branch_name
Downtown
Redwood
Perryridge
Relation borrower
43. Outer Join – Example
Inner Join
loan Borrower
loan_number amount
L-170
L-230
3000
4000
customer_name
Jones
Smith
branch_name
Downtown
Redwood
Jones
Smith
null
loan_number amount
L-170
L-230
L-260
3000
4000
1700
customer_name
branch_name
Downtown
Redwood
Perryridge
n Left Outer Join
loan Borrower
44. Outer Join – Example
loan_number amount
L-170
L-230
L-155
3000
4000
null
customer_name
Jones
Smith
Hayes
branch_name
Downtown
Redwood
null
loan_number amount
L-170
L-230
L-260
L-155
3000
4000
1700
null
customer_name
Jones
Smith
null
Hayes
branch_name
Downtown
Redwood
Perryridge
null
n Full Outer Join
loan borrower
n Right Outer Join
loan borrower
45. Compound Operator: Division
Useful for expressing “for all” queries like:
Find sids of sailors who have reserved all boats.
For A/B, attributes of B must be subset of attrs of A.
May need to “project” to make this happen.
E.g., let A have 2 fields, x and y; B have only field y:
A/B contains all tuples (x) such that for every y tuple in B,
there is an xy tuple in A.
A B x y B( x,y A)
47. Expressing A/B Using Basic Operators
Division is not essential op; just a useful shorthand.
(Also true of joins, but joins are so common that systems
implement joins specially.)
Idea: For A(x,y)/B(y), compute all x values that are not
`disqualified’ by some y value in B.
x value is disqualified if by attaching y value from B, we
obtain an xy tuple that is not in A.
Disqualified x values: p p
x x A B A
(( ( ) ) )
A/B: p x A
( ) Disqualified x values