
Database Management System
CS157A SJSU Fall 2015 Kaya
What is DB
 Definition of Database
 A collection of information organized to afford
efficient retrieval.
 ** not necessary to RDB **
Why do we need DB?
1. Sharing = support concurrent access by multiple
users(read and write)
2. Data Model Enforcement = make sure all apps see
clean and organized data.
3. Scale = work with dataset too large to fit in
memory
4. Flexibility = use data in new and unanticipated

Data Models
Database Model
 Kinds of Database model
1. Relational data model
2. Object oriented relational data model
3. Semi-structured data model
Relational data model
 Excel like i.e. working with tables
 Has operations
 Union, intersection, difference, selection, projection,
products, join, renaming
Perso
n
ID
Last
Name
First
Name
Date
Of
Birth
Hom
e
addr
Street
Hom
e
addr
City
Hom
e
Addr
zip
Hom
e
Addr
state
Work
Addr
street
Work
Addr
City
Work
Addr
Zip
Work
Addr
State
1 Yama
da
taro 4/15 aaa bbb 111 CA eee fff 222 CA
Object Oriented Relational Data Model
 Similar to Relational database
 Added: object, classes, and inheritance directly
support DB-schema and query language
OBJ: Person
Last Name
First Name
Date of Birth
Home Address
Work Address
OBJ: Address
Street
City
Zip
State
Refer
Object Oriented Relational Data Model
OBJ: Person
Last Name
First Name
Date of Birth
Home Address
Work Address
OBJ: Address
Street
City
Zip
State
Refer
Instance :Home Address
aaa
bbb
111
CA
Instance: Person
yamada
taro
4/15
Home
Work
Instance: Work Address
ccc
ddd
222
CA
Semi Structured Data Model
 Data are Represented by Graph or Tree
 To implement use XML
Movies
title
Genre
dram
a
Lengt
h
281
Year
1939
title
Year
1977
Lengt
h
124
Genre
scifi
Gone with the wind Star Wars
XML representation
<Movies>
<movie title =Gone with the wind>
<year>1939</year>
<Length>281</length>
<genre>drama</genre>
</movie >
<movie title =star wars>
<year>1992</year>
<Length>124</length>
<genre>scifi</genre>
</movie >
</Movies>
Other Data Model
 Hierarchical model
 Can be used to taxonomy(分類学)
☆Has parent ID
as meta data
Pictorial Representation
Relational Representation
Other Data Model
 Network model: differs from Relational model in
that data are represented by:
 Collection of Recodes
 Among data represented by link
Schema
Customer Account

Defining Schema in SQL
DATATYPE-letters-
 Character string
 Char(n): fixed length of char are stored. If you
KNOW number of chars will be stored, then use this.
 VARCHAR(N):upto n chars will be stored. If you
do NOT know number of chars will be stored then
use this.
 Bit string
 BIT(n): like char(n) fixed length of bit chars
 BIT VARYING(n):like varchar(n) upto n bit chars
Data types-math-
 BOOLEAN = {True , False}
 INTEGER
 SHORTINT: range is shorter then integer
 FLOAT
 DOUBLE
 DECIMAL(n, d): customized real number;
 NUMERIC(n, d): same as DECIMAL
Data type-time-
 DATE: formed by 'yyyy-mm-dd’
 TIME: formed by 'HH:mm:ss' or 'HH:mm:ss.d’
 Where d is a fraction of sec
 TIMESTAMP: formed by 'yyyy-mm-dd HH:mm:ss'
CreatingTables
 Syntax in SQL:
 In general
Create table_name(
Attribute1 data_type PRIMARY KEY
Attribute2 data_type DEFAULT value
Attribute3 data_type…….);
 In Example
Create Movie(
title varchar(50) PRIMARY KEY
year int DEFAULT 0000
length int);
reserved word = blue
Set initial value to 0000
Set title to be unique
key

Relational Operations

UNION
Union
 Basic Rules of Union
 # of columns and order of columns MUST be SAME
 Data type of columns on involving tables in each
query MUST be SAME or compatible
 Returned columns are usually from the first table
Title
varchar()
Year
Int
Length
Int
Title
varchar()
Year
Int
Length
Int
U
Title
varchar()
Year
Time
Length
Int
Title
varchar()
Year
Int
Length
Int
Syntax
 In general
SELECT attribute1, attribute2
FROM Table1
UNION
SELECTattribute1, attribute2
FROM Table2
 In example
SELECT prod_code, prod_name
FROM Product
UNION
SELECT prod_code,prod_name
FROM Parches
Example—table—
PUR_
#
PROD_CODE PROD_NAME COM_NAME PUR_QTY PUR_AMOUNT
2 PR001 TV SONY 15 450000
1 PR003 iPod PHILIPS 20 60000
3 PR007 laptop HP 6 240000
4 PR005 mobile NOKIA 100 300000
5 PR002 DVD player LG 10 30000
6 PR006 Sound system CREATIVE 8 40000
PROD_CODE PROD_NAME COM_NAME LIFE
PR001 TV SONY 7
PR002 DVD player LG 9
PR003 iPod PHILIPS 9
PR004 Sound system CREATIVE 8
PR005 mobile NOKIA 6
UNION
Products:
Purchase:
Example—output—
PROD_CODE PROD_NAME COM_NAME
PR001 TV SONY
PR002 DVD player LG
PR003 iPod PHILIPS
PR004 Sound system CREATIVE
PR005 mobile NOKIA
PR007 laptop HP
Products UNION of Purchase
Union with different columns name
SELECT prod_code,prod_name,life
FROM product
WHERE life>6
UNION
SELECT prod_code,prod_name,pur_qty
FROM purchase
WHERE pur_qty<20
PROD_CODE PROD_NAME COM_NAME LIFE(int)
PUR_# PROD_CODE PROD_NAME COM_NAME PUR_QT
Y
(int)
PUR_AMOUNT
the two queries have been set
using two different criteria(life and
PUR_QTY) and different columns.
BUT NOTE both criteria have
INTEGER VALUE
Union with different columns name
PROD_CODE PROD_NAME LIFE
PR001 TV 7
PR001 TV 15
PR002 DVD player 9
PR002 DVD player 10
PR003 iPod 9
PR004 Sound system 8
PR006 Sound system 8
PR007 laptop 6
Orange values come from
PRODUCT.LIFE
Blue values come from
PURCHASE.PUR_QTR
BE CAREFUL
IN Most of cases,
This is unwelcomed
result

INTERSECTION

Selection and Projection
Selection
 C is a condition (as in if-statement) that refers to
attributes of R2
 R1 is all those tuples of R2 that satisfy C
 SQL form
SELECT * FROM R2 WHERE C
Selection
Bar Beer Price
Joe’s Bud 2.50
Joe’s Miller 2.75
Sue’s Mud 2.50
Sue’s Miller 3.00
R2:
Bar Beer Price
Joe’s Bud 2.50
Joe’s Miller 2.75
R1:
C: BAR = Joe’s
Projection
 R1 is constructed by looking at each tuples of R2
extracting the attributes on list L, in the order
specified and creating from those components a
tuples for R1
 Eliminate Duplicated tuples if any
 SQL form
SELECT L FROM R2
Projection
Bar Beer Price
Joe’s Bud 2.50
Joe’s Miller 2.75
Sue’s Bud 2.50
Sue’s Miller 3.00
R2:
Beer Price
Bud 2.50
Miller 2.75
Bud 2.50
Miller 3.00
Beer Price
Bud 2.50
Miller 2.75
Miller 3.00
Delete duplicate

PRODUCT and JOIN
CROSS PRODUCT
 Consider ALL possible combinations of two or more
tables.
# of row inTable1
= x
# of row inTable2
= y
# of rows in Result tables
x * y
Syntax
 In general
SELECT T1.A1, T1.A2, T2.A1, T2.A2….
FROM T1
CROSS JOIN T2
 In example
SELECT Eats.pizza, Eats.name, Person.age,
Person.gender, Person.name
FROM Eats
CROSS JOIN Person
 Eats has 9 rows and Person has 20
results  9 * 20 = 180 rows
EQUI-JOIN
 Equi join performs a join against equality or
matching column’s value of the associated tables
 An equal sign(=) is used as comparison operator in
the WHERE clause to refer equality.
 Select * from t1, t2 where t1.attr1 = t2.attr2
 Also perform equi-join by using JOIN followed by
ON and then specifying names of the columns along
with their associated tables to check equality
EQUI-JOINID Attribute1
2 A2
5 A5
3 A3
1 -----
4 A4
ID Attribute2
5 B5
1 B1
3 -----
6 B6
2 B2
5 C4
T1: T2:
ID Attribute1 ID Attribute2
1 ----- 1 B1
2 A2 2 B2
3 A3 3 -----
5 A5 5 B5
5 A5 5 C4
SELECT *
FROM T1
JOIN T2
ON T1.ID = T2.ID
SELECT *
FROM T1 ,T2
WHERE T1.ID = T2.ID
NOTE
One of IDs is NOT
eliminated
ID 5 in T1 is matched
to two of ID 5 in T2.
So, ID 5 in T1 is
duplicated
Natural Join
 Natural Join is a type of EQUI-JOIN
 It is structured such a way that columns with same
name of associated table will appear only once
 No duplicated columns name
 Guidelines
 The associated table have one or more pairs of
identically named columns
 The columns MUST be the same data type
 Do not use ON clause in a natural join
Natural-JOINID Attribute1
2 A2
5 A5
3 A3
1 -----
4 A4
ID Attribute2
5 B5
1 B1
3 -----
6 B6
2 B2
5 C4
T1: T2:
ID Attribute1 Attribute2
1 ----- B1
2 A2 B2
3 A3 -----
5 A5 B5
5 A5 C4
SELECT *
FROM T1 ,T2
WHERE T1.ID = T2.ID
NOTE
One of IDs IS
eliminated
SELECT *
FROM T1
NATURAL JOIN
T2;
Theta-Join
 Theta join allows for arbitrary comparison relation
 Such as {<=, =>, <,>,= , !=}
 Relational Algebra Notation
where C = any Boolean-valued condition
 Take R1 × R2 then apply Projection with condition
C
Theta Join
Bar Beer Price
Joe’s Bud 2.50
Joe’s Miller 2.75
Sue’s Mud 2.50
Sue’s Coors 3.00
Name ADDR
Joe’s Maple St
Sue’s River St
R1: R2:
Bar Beer Price Name ADDR
Joe’s Bud 2.50 Joe’s Maple St
Joe’s Miller 2.75 Joe’s Maple St
Sue’s Mud 2.50 Sue’s River St
Sue’s Coors 3.00 Sue’s River St
C:
R1.Bar = R2.Name
Other Join --

Normalization
Normalization
 Why do we need to normalize data?
 To reduce redundancy and dependency
No normalization
 Problems without normalization
 Anomaly (矛盾/不調和) can happen:
 Update anomaly
 Insertion anomaly
 Deletion anomaly
 Solution  normalization!
 We need to data normalization to reduce anomalies
Update anomaly
 Update anomaly is a data inconsistency that result
from data redundancy and a partial update.
Update anomaly
EmployeeID Name Department Student group
123 J. Longfellow Accounting Beta Alpha Psi
234 B. Rech Marketing Marketing Club
234 B. Rech Marketing Marketing Manage
Club
456 A.Bruchs CIS Technology Org
456 A.Bruchs CIS Beta Alpha Psi
What happen if you update like below?
UPDATE Employee
SET department = “ECON”
WHERE StudentGroup = “technology Org”
Table: employee
Update anomaly
EmployeeID Name Department Student group
123 J. Longfellow Accounting Beta Alpha Psi
234 B. Rech Marketing Marketing Club
234 B. Rech Marketing Marketing Manage
Club
456 A.Bruchs ECON Technology Org
456 A.Bruchs CIS Beta Alpha Psi
When A.Bruchs’s department has been updated,say CIS to ECON ,
Then 5th row’ s department has to be updated too.
Otherwise, data can not be consistent
Can not be the
same person
any more !!!
Another Update Anomaly
S_id S_name S_address Suj_opted
401 Adam Noida Bio
402 Alex Panipat Math
403 Stuart Jammu Math
404 Adam Noida Physic
Update student’s address
that appears >= 2
We need to check ALL
ROWS for the update.
If this is not updated,
Adam lives two
different place
 inconsistency
Insertion Anomaly
 Insertion anomaly
 The inability to add data to DB due to absences
of other data
Insertion Anomaly
 This company hires Roy who has not decided student_group yet
 Insert into Employee (EmployeeID, Name, Department, StudentGroup)
values(125, “Roy”, “Math”, )  ERROR
 Need to have smaller table that only controls employees, not employees
AND their student group, department, etc.
EmployeeID Name Department Student group
123 J. Longfellow Accounting Beta Alpha Psi
234 B. Rech Marketing Marketing Club
234 B. Rech Marketing Marketing Manage
Club
456 A.Bruchs CIS Technology Org
456 A.Bruchs CIS Beta Alpha Psi
Deletion Anomaly
 Deletion anomaly is the unintended loss of data
due to deletion of other data.
Deletion anomaly
EmployeeID Name Department Student group
123 J. Longfellow Accounting Beta Alpha Psi
234 B. Rech Marketing Marketing Club
234 B. Rech Marketing Marketing Manage
Club
456 A.Bruchs CIS Technology Org
456 A.Bruchs CIS Beta Alpha Psi
 What happen if you execute:
delete from Employee
where StudentGroup = “Beta Alpha Psi”
Deletion Anomaly
 J.Longfellow no longer exists (as data)!!!
EmployeeID Name Department Student group
123 J. Longfellow Accounting Beta Alpha Psi
234 B. Rech Marketing Marketing Club
234 B. Rech Marketing Marketing Manage
Club
456 A.Bruchs CIS Technology Org
456 A.Bruchs CIS Beta Alpha Psi
Functional Dependencies
 Trivial functional dependency
 Partially functional dependency
A B C
B determines B == knowing B, can find B
A B C
B determines C == knowing B, can find C
Functional Dependencies
 Fully functional dependency
 Transitive dependency
A B C
A determines B AND C == knowing A, can find every non-key attributes
A B C
A determines B and B determines C
First Normalization Form
 Definition of 1NF
 Relation is in 1nf if it satisfy following condition:
 No two rows of data must contain repeating group
of information
 I.e. Each set of column must have an atomic value,
such that multiple columns cannot be used to fetch
the same row
2nd normalization form
 Definition: A relation is in 2nd nf if it satisfies following
condition:
 It is in 1st NF
 All non-key attributes are fully-functional dependency on
the primary key.
 Primary key has to be able to determine all other
attributes.
 A functional dependency that holds in a relation is partial
when removing one of the determining attributes gives a
functional dependency that holds in the relation.
 If {A,B}  {C} but also {A}  {C} then {C} is partially
functionally dependent on {A,B}
☆Can contain transitive functionality
3rd Normalization Form
 A relation is in 3rd NF if it satisfies the following
condition:
 It is in 2nd NF
 There is no transitive dependency
 Transitive dependency
A B C
relation
A determines B
B = f(A)
B determines C
C = h(B)
Transitive: C =h(f(A))
f h
BCNF
 Determinant: is any attribute(simple or composite) on which
some other attribute is fully functional dependent.
 BCNF definition:
 A relation R is in BCNF if and only if every determinant is
candidate key
 Note -- 3rd NF does not deal with:
 A relation has multiple candidate key
 Those candidate keys are composite
 The candidate keys overlap
 BCNF is to eliminate anomaly of those cases
BCNF is to deal with
cases where 3rd
normalization can not.
BCNF-Example
 Table = Supplies(supplier_no, supplier_name,city,zip)
 Supplier_name is unique
 Supplier_no and supplier_name are unique
 H1 (supplier_no) = city = g1(supplier_name)
 H2(supplier_no) = zip = g2(supplier_name)
 H3(supplier_no) = supplier_name
 G3(supplier_name) = supplier_no
PossibleAnomaly in BCNF
 INSERT: We cannot record the city for a supplier_no
without also knowing the supplier_name
 DELETE: If we delete the row for a given supplier_name,
we lose the information that the supplier_no is
associated with a given city.
 UPDATE: Since supplier_name is a candidate key
(unique), there are none.
http://www2.york.psu.edu/~lxn/IST_210/normal_form_definitions.html
 Possible solution
 Decompose Supplier into to two tables.
 SUPPLIER_INFO (supplier_no, city, zip)
 SUPPLIER_NAME (supplier_no, supplier_name)

Representation
Representation
 SQL Representation
select movietitle
From(select starname, movietitle from starln) a,
(select name from moviestar where birthdate like ‘%1974%’) b
Where a.starname = b.name
 Relational Algebra 3 Different
representations
shows the same
query
QueryTree
πmovietitle
∞movietitle
πstarname
πname
Starln
σmoveyearlike’%1974%’
MovieStar

Disk
Structure of disk
Reading one data at one time b/c using magnetic current is not reliable
 If failure, then it needs back to recover
Cylinder(non-physical)
All references
 https://iamcam.wordpress.com/2006/03/17/storing-
hierarchical-data-in-a-database-part-1/
 http://codex.cs.yale.edu/avi/db-book/db6/appendices-
dir/d.pdf
 http://www.w3resource.com/sql/sql-union.php
 http://blog.codinghorror.com/a-visual-explanation-of-
sql-joins/
 http://infolab.stanford.edu/~ullman/fcdb/aut07/slides/r
a.pdf
Cont
 http://blog.codinghorror.com/a-visual-explanation-
of-sql-joins/

Database Management System Review

  • 1.
  • 2.
    What is DB Definition of Database  A collection of information organized to afford efficient retrieval.  ** not necessary to RDB **
  • 3.
    Why do weneed DB? 1. Sharing = support concurrent access by multiple users(read and write) 2. Data Model Enforcement = make sure all apps see clean and organized data. 3. Scale = work with dataset too large to fit in memory 4. Flexibility = use data in new and unanticipated
  • 4.
  • 5.
    Database Model  Kindsof Database model 1. Relational data model 2. Object oriented relational data model 3. Semi-structured data model
  • 6.
    Relational data model Excel like i.e. working with tables  Has operations  Union, intersection, difference, selection, projection, products, join, renaming Perso n ID Last Name First Name Date Of Birth Hom e addr Street Hom e addr City Hom e Addr zip Hom e Addr state Work Addr street Work Addr City Work Addr Zip Work Addr State 1 Yama da taro 4/15 aaa bbb 111 CA eee fff 222 CA
  • 7.
    Object Oriented RelationalData Model  Similar to Relational database  Added: object, classes, and inheritance directly support DB-schema and query language OBJ: Person Last Name First Name Date of Birth Home Address Work Address OBJ: Address Street City Zip State Refer
  • 8.
    Object Oriented RelationalData Model OBJ: Person Last Name First Name Date of Birth Home Address Work Address OBJ: Address Street City Zip State Refer Instance :Home Address aaa bbb 111 CA Instance: Person yamada taro 4/15 Home Work Instance: Work Address ccc ddd 222 CA
  • 9.
    Semi Structured DataModel  Data are Represented by Graph or Tree  To implement use XML Movies title Genre dram a Lengt h 281 Year 1939 title Year 1977 Lengt h 124 Genre scifi Gone with the wind Star Wars
  • 10.
    XML representation <Movies> <movie title=Gone with the wind> <year>1939</year> <Length>281</length> <genre>drama</genre> </movie > <movie title =star wars> <year>1992</year> <Length>124</length> <genre>scifi</genre> </movie > </Movies>
  • 11.
    Other Data Model Hierarchical model  Can be used to taxonomy(分類学) ☆Has parent ID as meta data Pictorial Representation Relational Representation
  • 12.
    Other Data Model Network model: differs from Relational model in that data are represented by:  Collection of Recodes  Among data represented by link Schema Customer Account
  • 13.
  • 14.
    DATATYPE-letters-  Character string Char(n): fixed length of char are stored. If you KNOW number of chars will be stored, then use this.  VARCHAR(N):upto n chars will be stored. If you do NOT know number of chars will be stored then use this.  Bit string  BIT(n): like char(n) fixed length of bit chars  BIT VARYING(n):like varchar(n) upto n bit chars
  • 15.
    Data types-math-  BOOLEAN= {True , False}  INTEGER  SHORTINT: range is shorter then integer  FLOAT  DOUBLE  DECIMAL(n, d): customized real number;  NUMERIC(n, d): same as DECIMAL
  • 16.
    Data type-time-  DATE:formed by 'yyyy-mm-dd’  TIME: formed by 'HH:mm:ss' or 'HH:mm:ss.d’  Where d is a fraction of sec  TIMESTAMP: formed by 'yyyy-mm-dd HH:mm:ss'
  • 17.
    CreatingTables  Syntax inSQL:  In general Create table_name( Attribute1 data_type PRIMARY KEY Attribute2 data_type DEFAULT value Attribute3 data_type…….);  In Example Create Movie( title varchar(50) PRIMARY KEY year int DEFAULT 0000 length int); reserved word = blue Set initial value to 0000 Set title to be unique key
  • 18.
  • 19.
  • 20.
    Union  Basic Rulesof Union  # of columns and order of columns MUST be SAME  Data type of columns on involving tables in each query MUST be SAME or compatible  Returned columns are usually from the first table Title varchar() Year Int Length Int Title varchar() Year Int Length Int U Title varchar() Year Time Length Int Title varchar() Year Int Length Int
  • 21.
    Syntax  In general SELECTattribute1, attribute2 FROM Table1 UNION SELECTattribute1, attribute2 FROM Table2  In example SELECT prod_code, prod_name FROM Product UNION SELECT prod_code,prod_name FROM Parches
  • 22.
    Example—table— PUR_ # PROD_CODE PROD_NAME COM_NAMEPUR_QTY PUR_AMOUNT 2 PR001 TV SONY 15 450000 1 PR003 iPod PHILIPS 20 60000 3 PR007 laptop HP 6 240000 4 PR005 mobile NOKIA 100 300000 5 PR002 DVD player LG 10 30000 6 PR006 Sound system CREATIVE 8 40000 PROD_CODE PROD_NAME COM_NAME LIFE PR001 TV SONY 7 PR002 DVD player LG 9 PR003 iPod PHILIPS 9 PR004 Sound system CREATIVE 8 PR005 mobile NOKIA 6 UNION Products: Purchase:
  • 23.
    Example—output— PROD_CODE PROD_NAME COM_NAME PR001TV SONY PR002 DVD player LG PR003 iPod PHILIPS PR004 Sound system CREATIVE PR005 mobile NOKIA PR007 laptop HP Products UNION of Purchase
  • 24.
    Union with differentcolumns name SELECT prod_code,prod_name,life FROM product WHERE life>6 UNION SELECT prod_code,prod_name,pur_qty FROM purchase WHERE pur_qty<20 PROD_CODE PROD_NAME COM_NAME LIFE(int) PUR_# PROD_CODE PROD_NAME COM_NAME PUR_QT Y (int) PUR_AMOUNT the two queries have been set using two different criteria(life and PUR_QTY) and different columns. BUT NOTE both criteria have INTEGER VALUE
  • 25.
    Union with differentcolumns name PROD_CODE PROD_NAME LIFE PR001 TV 7 PR001 TV 15 PR002 DVD player 9 PR002 DVD player 10 PR003 iPod 9 PR004 Sound system 8 PR006 Sound system 8 PR007 laptop 6 Orange values come from PRODUCT.LIFE Blue values come from PURCHASE.PUR_QTR BE CAREFUL IN Most of cases, This is unwelcomed result
  • 26.
  • 27.
  • 28.
    Selection  C isa condition (as in if-statement) that refers to attributes of R2  R1 is all those tuples of R2 that satisfy C  SQL form SELECT * FROM R2 WHERE C
  • 29.
    Selection Bar Beer Price Joe’sBud 2.50 Joe’s Miller 2.75 Sue’s Mud 2.50 Sue’s Miller 3.00 R2: Bar Beer Price Joe’s Bud 2.50 Joe’s Miller 2.75 R1: C: BAR = Joe’s
  • 30.
    Projection  R1 isconstructed by looking at each tuples of R2 extracting the attributes on list L, in the order specified and creating from those components a tuples for R1  Eliminate Duplicated tuples if any  SQL form SELECT L FROM R2
  • 31.
    Projection Bar Beer Price Joe’sBud 2.50 Joe’s Miller 2.75 Sue’s Bud 2.50 Sue’s Miller 3.00 R2: Beer Price Bud 2.50 Miller 2.75 Bud 2.50 Miller 3.00 Beer Price Bud 2.50 Miller 2.75 Miller 3.00 Delete duplicate
  • 32.
  • 33.
    CROSS PRODUCT  ConsiderALL possible combinations of two or more tables. # of row inTable1 = x # of row inTable2 = y # of rows in Result tables x * y
  • 35.
    Syntax  In general SELECTT1.A1, T1.A2, T2.A1, T2.A2…. FROM T1 CROSS JOIN T2  In example SELECT Eats.pizza, Eats.name, Person.age, Person.gender, Person.name FROM Eats CROSS JOIN Person  Eats has 9 rows and Person has 20 results  9 * 20 = 180 rows
  • 36.
    EQUI-JOIN  Equi joinperforms a join against equality or matching column’s value of the associated tables  An equal sign(=) is used as comparison operator in the WHERE clause to refer equality.  Select * from t1, t2 where t1.attr1 = t2.attr2  Also perform equi-join by using JOIN followed by ON and then specifying names of the columns along with their associated tables to check equality
  • 37.
    EQUI-JOINID Attribute1 2 A2 5A5 3 A3 1 ----- 4 A4 ID Attribute2 5 B5 1 B1 3 ----- 6 B6 2 B2 5 C4 T1: T2: ID Attribute1 ID Attribute2 1 ----- 1 B1 2 A2 2 B2 3 A3 3 ----- 5 A5 5 B5 5 A5 5 C4 SELECT * FROM T1 JOIN T2 ON T1.ID = T2.ID SELECT * FROM T1 ,T2 WHERE T1.ID = T2.ID NOTE One of IDs is NOT eliminated ID 5 in T1 is matched to two of ID 5 in T2. So, ID 5 in T1 is duplicated
  • 38.
    Natural Join  NaturalJoin is a type of EQUI-JOIN  It is structured such a way that columns with same name of associated table will appear only once  No duplicated columns name  Guidelines  The associated table have one or more pairs of identically named columns  The columns MUST be the same data type  Do not use ON clause in a natural join
  • 39.
    Natural-JOINID Attribute1 2 A2 5A5 3 A3 1 ----- 4 A4 ID Attribute2 5 B5 1 B1 3 ----- 6 B6 2 B2 5 C4 T1: T2: ID Attribute1 Attribute2 1 ----- B1 2 A2 B2 3 A3 ----- 5 A5 B5 5 A5 C4 SELECT * FROM T1 ,T2 WHERE T1.ID = T2.ID NOTE One of IDs IS eliminated SELECT * FROM T1 NATURAL JOIN T2;
  • 40.
    Theta-Join  Theta joinallows for arbitrary comparison relation  Such as {<=, =>, <,>,= , !=}  Relational Algebra Notation where C = any Boolean-valued condition  Take R1 × R2 then apply Projection with condition C
  • 41.
    Theta Join Bar BeerPrice Joe’s Bud 2.50 Joe’s Miller 2.75 Sue’s Mud 2.50 Sue’s Coors 3.00 Name ADDR Joe’s Maple St Sue’s River St R1: R2: Bar Beer Price Name ADDR Joe’s Bud 2.50 Joe’s Maple St Joe’s Miller 2.75 Joe’s Maple St Sue’s Mud 2.50 Sue’s River St Sue’s Coors 3.00 Sue’s River St C: R1.Bar = R2.Name
  • 42.
  • 43.
  • 44.
    Normalization  Why dowe need to normalize data?  To reduce redundancy and dependency
  • 45.
    No normalization  Problemswithout normalization  Anomaly (矛盾/不調和) can happen:  Update anomaly  Insertion anomaly  Deletion anomaly  Solution  normalization!  We need to data normalization to reduce anomalies
  • 46.
    Update anomaly  Updateanomaly is a data inconsistency that result from data redundancy and a partial update.
  • 47.
    Update anomaly EmployeeID NameDepartment Student group 123 J. Longfellow Accounting Beta Alpha Psi 234 B. Rech Marketing Marketing Club 234 B. Rech Marketing Marketing Manage Club 456 A.Bruchs CIS Technology Org 456 A.Bruchs CIS Beta Alpha Psi What happen if you update like below? UPDATE Employee SET department = “ECON” WHERE StudentGroup = “technology Org” Table: employee
  • 48.
    Update anomaly EmployeeID NameDepartment Student group 123 J. Longfellow Accounting Beta Alpha Psi 234 B. Rech Marketing Marketing Club 234 B. Rech Marketing Marketing Manage Club 456 A.Bruchs ECON Technology Org 456 A.Bruchs CIS Beta Alpha Psi When A.Bruchs’s department has been updated,say CIS to ECON , Then 5th row’ s department has to be updated too. Otherwise, data can not be consistent Can not be the same person any more !!!
  • 49.
    Another Update Anomaly S_idS_name S_address Suj_opted 401 Adam Noida Bio 402 Alex Panipat Math 403 Stuart Jammu Math 404 Adam Noida Physic Update student’s address that appears >= 2 We need to check ALL ROWS for the update. If this is not updated, Adam lives two different place  inconsistency
  • 50.
    Insertion Anomaly  Insertionanomaly  The inability to add data to DB due to absences of other data
  • 51.
    Insertion Anomaly  Thiscompany hires Roy who has not decided student_group yet  Insert into Employee (EmployeeID, Name, Department, StudentGroup) values(125, “Roy”, “Math”, )  ERROR  Need to have smaller table that only controls employees, not employees AND their student group, department, etc. EmployeeID Name Department Student group 123 J. Longfellow Accounting Beta Alpha Psi 234 B. Rech Marketing Marketing Club 234 B. Rech Marketing Marketing Manage Club 456 A.Bruchs CIS Technology Org 456 A.Bruchs CIS Beta Alpha Psi
  • 52.
    Deletion Anomaly  Deletionanomaly is the unintended loss of data due to deletion of other data.
  • 53.
    Deletion anomaly EmployeeID NameDepartment Student group 123 J. Longfellow Accounting Beta Alpha Psi 234 B. Rech Marketing Marketing Club 234 B. Rech Marketing Marketing Manage Club 456 A.Bruchs CIS Technology Org 456 A.Bruchs CIS Beta Alpha Psi  What happen if you execute: delete from Employee where StudentGroup = “Beta Alpha Psi”
  • 54.
    Deletion Anomaly  J.Longfellowno longer exists (as data)!!! EmployeeID Name Department Student group 123 J. Longfellow Accounting Beta Alpha Psi 234 B. Rech Marketing Marketing Club 234 B. Rech Marketing Marketing Manage Club 456 A.Bruchs CIS Technology Org 456 A.Bruchs CIS Beta Alpha Psi
  • 55.
    Functional Dependencies  Trivialfunctional dependency  Partially functional dependency A B C B determines B == knowing B, can find B A B C B determines C == knowing B, can find C
  • 56.
    Functional Dependencies  Fullyfunctional dependency  Transitive dependency A B C A determines B AND C == knowing A, can find every non-key attributes A B C A determines B and B determines C
  • 57.
    First Normalization Form Definition of 1NF  Relation is in 1nf if it satisfy following condition:  No two rows of data must contain repeating group of information  I.e. Each set of column must have an atomic value, such that multiple columns cannot be used to fetch the same row
  • 58.
    2nd normalization form Definition: A relation is in 2nd nf if it satisfies following condition:  It is in 1st NF  All non-key attributes are fully-functional dependency on the primary key.  Primary key has to be able to determine all other attributes.  A functional dependency that holds in a relation is partial when removing one of the determining attributes gives a functional dependency that holds in the relation.  If {A,B}  {C} but also {A}  {C} then {C} is partially functionally dependent on {A,B} ☆Can contain transitive functionality
  • 59.
    3rd Normalization Form A relation is in 3rd NF if it satisfies the following condition:  It is in 2nd NF  There is no transitive dependency  Transitive dependency A B C relation A determines B B = f(A) B determines C C = h(B) Transitive: C =h(f(A)) f h
  • 60.
    BCNF  Determinant: isany attribute(simple or composite) on which some other attribute is fully functional dependent.  BCNF definition:  A relation R is in BCNF if and only if every determinant is candidate key  Note -- 3rd NF does not deal with:  A relation has multiple candidate key  Those candidate keys are composite  The candidate keys overlap  BCNF is to eliminate anomaly of those cases BCNF is to deal with cases where 3rd normalization can not.
  • 61.
    BCNF-Example  Table =Supplies(supplier_no, supplier_name,city,zip)  Supplier_name is unique  Supplier_no and supplier_name are unique  H1 (supplier_no) = city = g1(supplier_name)  H2(supplier_no) = zip = g2(supplier_name)  H3(supplier_no) = supplier_name  G3(supplier_name) = supplier_no
  • 62.
    PossibleAnomaly in BCNF INSERT: We cannot record the city for a supplier_no without also knowing the supplier_name  DELETE: If we delete the row for a given supplier_name, we lose the information that the supplier_no is associated with a given city.  UPDATE: Since supplier_name is a candidate key (unique), there are none. http://www2.york.psu.edu/~lxn/IST_210/normal_form_definitions.html
  • 63.
     Possible solution Decompose Supplier into to two tables.  SUPPLIER_INFO (supplier_no, city, zip)  SUPPLIER_NAME (supplier_no, supplier_name)
  • 64.
  • 65.
    Representation  SQL Representation selectmovietitle From(select starname, movietitle from starln) a, (select name from moviestar where birthdate like ‘%1974%’) b Where a.starname = b.name  Relational Algebra 3 Different representations shows the same query
  • 66.
  • 67.
  • 68.
    Structure of disk Readingone data at one time b/c using magnetic current is not reliable  If failure, then it needs back to recover Cylinder(non-physical)
  • 69.
    All references  https://iamcam.wordpress.com/2006/03/17/storing- hierarchical-data-in-a-database-part-1/ http://codex.cs.yale.edu/avi/db-book/db6/appendices- dir/d.pdf  http://www.w3resource.com/sql/sql-union.php  http://blog.codinghorror.com/a-visual-explanation-of- sql-joins/  http://infolab.stanford.edu/~ullman/fcdb/aut07/slides/r a.pdf
  • 70.